Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Income,Limit,Rating,Cards,Age,Education,Own,Student,Married,Region,Balance 14.891,3606,283,2,34,11,No,No,Yes,South,333 106.025,6645,483,3,82,15,Yes,Yes,Yes,West,903...

1 answer below »
Income,Limit,Rating,Cards,Age,Education,Own,Student,Ma
ied,Region,Balance
14.891,3606,283,2,34,11,No,No,Yes,South,333
106.025,6645,483,3,82,15,Yes,Yes,Yes,West,903
104.593,7075,514,4,71,11,No,No,No,West,580
148.924,9504,681,3,36,11,Yes,No,No,West,964
55.882,4897,357,2,68,16,No,No,Yes,South,331
80.18,8047,569,4,77,10,No,No,No,South,1151
20.996,3388,259,2,37,12,Yes,No,No,East,203
71.408,7114,512,2,87,9,No,No,No,West,872
15.125,3300,266,5,66,13,Yes,No,No,South,279
71.061,6819,491,3,41,19,Yes,Yes,Yes,East,1350
63.095,8117,589,4,30,14,No,No,Yes,South,1407
15.045,1311,138,3,64,16,No,No,No,South,0
80.616,5308,394,1,57,7,Yes,No,Yes,West,204
43.682,6922,511,1,49,9,No,No,Yes,South,1081
19.144,3291,269,2,75,13,Yes,No,No,East,148
20.089,2525,200,3,57,15,Yes,No,Yes,East,0
53.598,3714,286,3,73,17,Yes,No,Yes,East,0
36.496,4378,339,3,69,15,Yes,No,Yes,West,368
49.57,6384,448,1,28,9,Yes,No,Yes,West,891
42.079,6626,479,2,44,9,No,No,No,West,1048
17.7,2860,235,4,63,16,Yes,No,No,West,89
37.348,6378,458,1,72,17,Yes,No,No,South,968
20.103,2631,213,3,61,10,No,No,Yes,East,0
64.027,5179,398,5,48,8,No,No,Yes,East,411
10.742,1757,156,3,57,15,Yes,No,No,South,0
14.09,4323,326,5,25,16,Yes,No,Yes,East,671
42.471,3625,289,6,44,12,Yes,Yes,No,South,654
32.793,4534,333,2,44,16,No,No,No,East,467
186.634,13414,949,2,41,14,Yes,No,Yes,East,1809
26.813,5611,411,4,55,16,Yes,No,No,South,915
34.142,5666,413,4,47,5,Yes,No,Yes,South,863
28.941,2733,210,5,43,16,No,No,Yes,West,0
134.181,7838,563,2,48,13,Yes,No,No,South,526
31.367,1829,162,4,30,10,No,No,Yes,South,0
20.15,2646,199,2,25,14,Yes,No,Yes,West,0
23.35,2558,220,3,49,12,Yes,Yes,No,South,419
62.413,6457,455,2,71,11,Yes,No,Yes,South,762
30.007,6481,462,2,69,9,Yes,No,Yes,South,1093
11.795,3899,300,4,25,10,Yes,No,No,South,531
13.647,3461,264,4,47,14,No,No,Yes,South,344
34.95,3327,253,3,54,14,Yes,No,No,East,50
113.659,7659,538,2,66,15,No,Yes,Yes,East,1155
44.158,4763,351,2,66,13,Yes,No,Yes,West,385
36.929,6257,445,1,24,14,Yes,No,Yes,West,976
31.861,6375,469,3,25,16,Yes,No,Yes,South,1120
77.38,7569,564,3,50,12,Yes,No,Yes,South,997
19.531,5043,376,2,64,16,Yes,Yes,Yes,West,1241
44.646,4431,320,2,49,15,No,Yes,Yes,South,797
44.522,2252,205,6,72,15,No,No,Yes,West,0
43.479,4569,354,4,49,13,No,Yes,Yes,East,902
36.362,5183,376,3,49,15,No,No,Yes,East,654
39.705,3969,301,2,27,20,No,No,Yes,East,211
44.205,5441,394,1,32,12,No,No,Yes,South,607
16.304,5466,413,4,66,10,No,No,Yes,West,957
15.333,1499,138,2,47,9,Yes,No,Yes,West,0
32.916,1786,154,2,60,8,Yes,No,Yes,West,0
57.1,4742,372,7,79,18,Yes,No,Yes,West,379
76.273,4779,367,4,65,14,Yes,No,Yes,South,133
10.354,3480,281,2,70,17,No,No,Yes,South,333
51.872,5294,390,4,81,17,Yes,No,No,South,531
35.51,5198,364,2,35,20,Yes,No,No,West,631
21.238,3089,254,3,59,10,Yes,No,No,South,108
30.682,1671,160,2,77,7,Yes,No,No,South,0
14.132,2998,251,4,75,17,No,No,No,South,133
32.164,2937,223,2,79,15,Yes,No,Yes,East,0
12,4160,320,4,28,14,Yes,No,Yes,South,602
113.829,9704,694,4,38,13,Yes,No,Yes,West,1388
11.187,5099,380,4,69,16,Yes,No,No,East,889
27.847,5619,418,2,78,15,Yes,No,Yes,South,822
49.502,6819,505,4,55,14,No,No,Yes,South,1084
24.889,3954,318,4,75,12,No,No,Yes,South,357
58.781,7402,538,2,81,12,Yes,No,Yes,West,1103
22.939,4923,355,1,47,18,Yes,No,Yes,West,663
23.989,4523,338,4,31,15,No,No,No,South,601
16.103,5390,418,4,45,10,Yes,No,Yes,South,945
33.017,3180,224,2,28,16,No,No,Yes,East,29
30.622,3293,251,1,68,16,No,Yes,No,South,532
20.936,3254,253,1,30,15,Yes,No,No,West,145
110.968,6662,468,3,45,11,Yes,No,Yes,South,391
15.354,2101,171,2,65,14,No,No,No,West,0
27.369,3449,288,3,40,9,Yes,No,Yes,South,162
53.48,4263,317,1,83,15,No,No,No,South,99
23.672,4433,344,3,63,11,No,No,No,South,503
19.225,1433,122,3,38,14,Yes,No,No,South,0
43.54,2906,232,4,69,11,No,No,No,South,0
152.298,12066,828,4,41,12,Yes,No,Yes,West,1779
55.367,6340,448,1,33,15,No,No,Yes,South,815
11.741,2271,182,4,59,12,Yes,No,No,West,0
15.56,4307,352,4,57,8,No,No,Yes,East,579
59.53,7518,543,3,52,9,Yes,No,No,East,1176
20.191,5767,431,4,42,16,No,No,Yes,East,1023
48.498,6040,456,3,47,16,No,No,Yes,South,812
30.733,2832,249,4,51,13,No,No,No,South,0
16.479,5435,388,2,26,16,No,No,No,East,937
38.009,3075,245,3,45,15,Yes,No,No,East,0
14.084,855,120,5,46,17,Yes,No,Yes,East,0
14.312,5382,367,1,59,17,No,Yes,No,West,1380
26.067,3388,266,4,74,17,Yes,No,Yes,East,155
36.295,2963,241,2,68,14,Yes,Yes,No,East,375
83.851,8494,607,5,47,18,No,No,No,South,1311
21.153,3736,256,1,41,11,No,No,No,South,298
17.976,2433,190,3,70,16,Yes,Yes,No,South,431
68.713,7582,531,2,56,16,No,Yes,No,South,1587
146.183,9540,682,6,66,15,No,No,No,South,1050
15.846,4768,365,4,53,12,Yes,No,No,South,745
12.031,3182,259,2,58,18,Yes,No,Yes,South,210
16.819,1337,115,2,74,15,No,No,Yes,West,0
39.11,3189,263,3,72,12,No,No,No,West,0
107.986,6033,449,4,64,14,No,No,Yes,South,227
13.561,3261,279,5,37,19,No,No,Yes,West,297
34.537,3271,250,3,57,17,Yes,No,Yes,West,47
28.575,2959,231,2,60,11,Yes,No,No,East,0
46.007,6637,491,4,42,14,No,No,Yes,South,1046
69.251,6386,474,4,30,12,Yes,No,Yes,West,768
16.482,3326,268,4,41,15,No,No,No,South,271
40.442,4828,369,5,81,8,Yes,No,No,East,510
35.177,2117,186,3,62,16,Yes,No,No,South,0
91.362,9113,626,1,47,17,No,No,Yes,West,1341
27.039,2161,173,3,40,17,Yes,No,No,South,0
23.012,1410,137,3,81,16,No,No,No,South,0
27.241,1402,128,2,67,15,Yes,No,Yes,West,0
148.08,8157,599,2,83,13,No,No,Yes,South,454
62.602,7056,481,1,84,11,Yes,No,No,South,904
11.808,1300,117,3,77,14,Yes,No,No,East,0
29.564,2529,192,1,30,12,Yes,No,Yes,South,0
27.578,2531,195,1,34,15,Yes,No,Yes,South,0
26.427,5533,433,5,50,15,Yes,Yes,Yes,West,1404
57.202,3411,259,3,72,11,Yes,No,No,South,0
123.299,8376,610,2,89,17,No,Yes,No,East,1259
18.145,3461,279,3,56,15,No,No,Yes,East,255
23.793,3821,281,4,56,12,Yes,Yes,Yes,East,868
10.726,1568,162,5,46,19,No,No,Yes,West,0
23.283,5443,407,4,49,13,No,No,Yes,East,912
21.455,5829,427,4,80,12,Yes,No,Yes,East,1018
34.664,5835,452,3,77,15,Yes,No,Yes,East,835
44.473,3500,257,3,81,16,Yes,No,No,East,8
54.663,4116,314,2,70,8,Yes,No,No,East,75
36.355,3613,278,4,35,9,No,No,Yes,West,187
21.374,2073,175,2,74,11,Yes,No,Yes,South,0
107.841,10384,728,3,87,7,No,No,No,East,1597
39.831,6045,459,3,32,12,Yes,Yes,Yes,East,1425
91.876,6754,483,2,33,10,No,No,Yes,South,605
103.893,7416,549,3,84,17,No,No,No,West,669
19.636,4896,387,3,64,10,Yes,No,No,East,710
17.392,2748,228,3,32,14,No,No,Yes,South,68
19.529,4673,341,2,51,14,No,No,No,West,642
17.055,5110,371,3,55,15,Yes,No,Yes,South,805
23.857,1501,150,3,56,16,No,No,Yes,South,0
15.184,2420,192,2,69,11,Yes,No,Yes,South,0
13.444,886,121,5,44,10,No,No,Yes,West,0
63.931,5728,435,3,28,14,Yes,No,Yes,East,581
35.864,4831,353,3,66,13,Yes,No,Yes,South,534
41.419,2120,184,4,24,11,Yes,Yes,No,South,156
92.112,4612,344,3,32,17,No,No,No,South,0
55.056,3155,235,2,31,16,No,No,Yes,East,0
19.537,1362,143,4,34,9,Yes,No,Yes,West,0
31.811,4284,338,5,75,13,Yes,No,Yes,South,429
56.256,5521,406,2,72,16,Yes,Yes,Yes,South,1020
42.357,5550,406,2,83,12,Yes,No,Yes,West,653
53.319,3000,235,3,53,13,No,No,No,West,0
12.238,4865,381,5,67,11,Yes,No,No,South,836
31.353,1705,160,3,81,14,No,No,Yes,South,0
63.809,7530,515,1,56,12,No,No,Yes,South,1086
13.676,2330,203,5,80,16,Yes,No,No,East,0
76.782,5977,429,4,44,12,No,No,Yes,West,548
25.383,4527,367,4,46,11,No,No,Yes
Answered 8 days After Oct 18, 2021

Solution

Uttam answered on Oct 27 2021
118 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "BQu32icXDMSj"
},
"source": [
"# Assignment 2 (15 points)
font>\n",
" \n",
"***\n",
"\n",
"### General Instructions\n",
" + You may need additional li
aries besides the Python standard li
ary to solve some questions. Import only necessary li
aries. \n",
" + If more than one li
ary exist for a same purpose, choose the one you wish as long as it does the task properly. \n",
" + If we want you to use a specific li
ary, then we will state it clearly. \n",
" + Use the exact variable names asked in the questions. When no clear instructions given, feel free to do it the way you would like to.\n",
" + After each question, add the needed number of new cells and place your answers inside the cells. \n",
" + Use text cells for explanations. Use explanation and plain text as much as possible. \n",
" + Do not remove or modify the original cells provided by the instructor.\n",
" + In the following cell you will find some extra options to make your code more readable, including output colors RED, OKBLUE, or output text styles like BOLD or UNDERLINE that. Do not hesitate to use them. As an example, one may output text in red as follows: \n",
" ```python\n",
" print(bcolors.RED + \"your text\" + bcolors.ENDC)\n",
" ```\n",
" + Comment your code whenever needed using # sign at the beginning of the row.\n",
" + In some questions some of the details needed for solving the problem are **purposely** omitted to encourage additional self-directed research. This, especially, helps you develop some search skills for coding in Python (which is inevitable due to the inconsistent syntax of Python).\n",
" + Do not hesitate to communicate your questions to the TA's or instructors. \n",
" \n",
" Good luck! "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "KJ3ey5VeDMSr"
},
"outputs": [],
"source": [
"\n",
"# The following piece of code gives the opportunity to show multiple outputs\n",
"# in one cell:\n",
"from IPython.core.interactiveshell import InteractiveShell\n",
"InteractiveShell.ast_node_interactivity = \"all\"\n",
"\n",
"\n",
"# Colorful outputs\n",
"class bcolors:\n",
" RED = '\\033[91m'\n",
" OKBLUE = '\\033[94m'\n",
" BOLD = '\\033[1m'\n",
" UNDERLINE = '\\033[4m'\n",
" ENDC = '\\033[0m'"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "fn7Hdf97DMS0"
},
"source": [
"## **Part A** (7 points)
font>\n",
"\n",
"1. ** (1 point)
font>** Download `Credit.csv` from faculty.marshall.usc.edu/gareth-james/ISL/data.html> and upload it into this notebook. Print the first $5$ rows of the data. Using appropriate descriptive statistics or visualization methods describe the variables and possible association amongst them. Interpret the results. \n",
"2. ** (0.5 points)
font>** Keep only `Income`, `Limit`, `Rating`, `Cards`, `Age`, `Education`, and `Balance` as your variables and throw the rest of variables away. Print the dimension of this new dataset. \n",
"3. ** (0.5 points)
font>** Create a binary variable `Balance_1500` which equals $1$ for the observations with `Balance` $> 1500$, and equals $0$ otherwise.\n",
"4. ** (3 points)
font>** Model `Balance_1500` by the explanatory variables `Income`, `Limit`, `Rating`, `Cards`, `Age`, `Education` using the following models: \n",
" + logistic regression, \n",
" + linear discriminant, and \n",
" + quadratic discriminant.\n",
"8. ** (0.5 points)
font>** Find the probability of (`Balance` $> 1500$), for the following values, using all three aforementionned methods:\n",
"\n",
"| Income | Limit | Rating | Cards | Age | Education | \n",
"|--------------|--------------|----------------|--------------|-----------------|---------------|\n",
"| 63 | 8100 | 600 | 4 | 30 | 13 |\n",
"| 186 | 13414 | 950 | 2 | 41 | 13 |\n",
"\n",
"
\n",
"Compare the probabilities and comment.\n",
" \n",
"9. ** (1.5 points)
font>** For each method, print the confusion matrix, the accuracy score and the AUC using all observations. Compare these metrics and comment. "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "5KzgIsCdn_ux"
},
"source": [
"## **Part B** (8 points)
font>\n",
"\n",
"Donwload `ziptrain.csv` and `ziptest.csv` datasets from **Athena/Content/Data**. Save them and upload them here as **two separate datasets** and name them `ziptrain` and `ziptest`, respectively. Explore the data in order to understand it. \n",
"\n",
" 1. **(1 point)
font>** From `ziptrain` dataset select only the rows co
esponding to digits $2$ and $7$ and save them in a new dataset called `binar_train`. Do the same thing in `ziptest` and call it `binar_test`. \n",
" 2. **(1 point)
font>** Project `binar_train` onto the first **two principal components** and make a scatterplot of the data in the new space (two-dimensional space spanned by the frist two PCs). Use a different color (or marker) for each digit. Based on the plot do you think that these two digits can be separated well using only two PCs? Explain.\n",
" 3. **(1 point)
font>** Fit a **logistic regression**, in the new space, to separate digits $2$ and $7$. \n",
" 4. **(1 point)
font>** Evaluate the trainded model on `binar_test` using **accuracy**, and an **appropriate F-measure**. \n",
" 5. **(0.5 points)
font>** Build and print a confusion matrix for your predictions.\n",
"\n",
"For the rest of the questions use the **whole training data**, i.e., `ziptrain` (**not** `binar_train`). \n",
"\n",
" 6. **(1 point)
font>** Project the whole data onto the first $m=2, 3, 4, 5$ principal components (one $m$ at a time).\n",
" 7. **(1.5 points)
font>** For each $m$, and using **$5$-fold cross-validation**, train a **linear discriminant** classifier on `ziptrain`. \n",
" 8. **(1 point)
font>** Based on **cross-validated accuracy**, select the best number of principal components $m$.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"