Great Deal! Get Instant \$10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

# Assignment: Decision Trees Learning outcomes · Understand how to use decision trees on a Dataset to make a prediction · Learning hyper-parameters tuning for decision trees by using RandomGrid ·...

1 answer below »
Assignment: Decision Trees
Learning outcomes
· Understand how to use decision trees on a Dataset to make a prediction
· Learning hyper-parameters tuning for decision trees by using RandomGrid
· Learning the effectiveness of ensemble algorithms (Random Forest, Adaboost, Extra trees classifier, Gradient Boosted Tree)
·
· In the first part of this assignment, you will use Classification Trees for predicting if a user has a default payment option active or not. You can find the necessary data for performing this assignment here
· This dataset is aimed at the case of customer default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default.
· Required imports for this project are given below. Make sure you have all li
aries required for this project installed. You may use conda or pip based on your set up.
· NOTE: Since data is in Excel format you need to install xlrd in order to read the excel file inside your pandas dataframe. You can run pip install xlrd to install
Questions (15 points total)
Question 1 (2 pts)
Build a classifier by using decision tree and calculate the confusion matrix. Try different hyper-parameters (at least two) and discuss the result.
Question 2 (4 pts)
Try to build the decision tree which you built for the previous question, but this time by RandomGrid search over hyper-parameters. Compare the results.
Question 3 (6 pts)
Try to build the same classifier by using following ensemble models. For each of these models calculate accuracy and at least for two in the list below, plot the learning curves.
· Random Forest
· Extra Trees Classifie
· Gradient Boosted Trees
Question 4 (3 pts)
Discuss and compare the results for the all past three questions.
· How does changing hyperparms effect model performance?
· Why do you think certain models performed bette
worse?
· How does this performance line up with known strengths/weakness of these models?
Answered Same Day Jul 17, 2021

## Solution

Suraj answered on Jul 19 2021
{
"cells": [
{
"cell_type": "markdown",
"colab_type": "text",
"id": "9OBvBOCkPrga"
},
"source": [
"## Assignment 4"
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "bEmSTWZSPrgb"
},
"source": [
"This assignment is based on content discussed in module 8 and using Decision Trees and Ensemble Models in classification and regression problems."
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "1cUoTzQLPrgc"
},
"source": [
"## Learning outcomes "
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "Q1ygYVo_Prgc"
},
"source": [
"- Understand how to use decision trees on a Dataset to make a prediction\n",
"- Learning hyper-parameters tuning for decision trees by using RandomGrid \n",
"- Learning the effectiveness of ensemble algorithms (Random Forest, Adaboost, Extra trees classifier, Gradient Boosted Tree)"
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "9hjVbQlVPrgd"
},
"source": [
"In the first part of this assignment, you will use Classification Trees for predicting if a user has a default payment option active or not. You can find the necessary data for performing this assignment [here](https:
archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) \n",
"\n",
"This dataset is aimed at the case of customer default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default.\n",
"\n",
"Required imports for this project are given below. Make sure you have all li
aries required for this project installed. You may use conda or pip based on your set up.\n",
"\n",
"__NOTE:__ Since data is in Excel format you need to install `xlrd` in order to read the excel file inside your pandas dataframe. You can run `pip install xlrd` to install "
]
},
{
"cell_type": "code",
"execution_count": 1,
"colab": {},
"colab_type": "code",
"id": "R376ZBnBPrge"
},
"outputs": [],
"source": [
"#required imports\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "ddF9R5pdPrgi"
},
"source": [
"After installing the necessary li
aries, proceed to download the data. Since reading the excel file won't create headers by default, we added two more operations to substitute the columns."
]
},
{
"cell_type": "code",
"execution_count": 3,
"colab": {},
"colab_type": "code",
"id": "CtNCjjr7Prgj"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"None\n"
]
}
],
"source": [
archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls\")\n",
"#dataset.columns = dataset.iloc[0]\n",
"#dataset.drop(['ID'], inplace=True)\n",
"dataset.drop(dataset.columns[dataset.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)\n",
"print(dataset.drop(0,inplace=True))"
]
},
{
"cell_type": "markdown",
"colab_type": "text",
"id": "cMh-sEIdPrgl"
},
"source": [
"In the following, you can take a look into the dataset."
]
},
{
"cell_type": "code",
"execution_count": 4,