Lab 6_Model Evaluation.docxMIS 545 Lab 6: Model Evaluation1 OverviewIn this lab, we will examine the...

Question

Lab 6_Model Evaluation.docxMIS 545 Lab 6: Model Evaluation1 OverviewIn this lab, we will examine the performance of prediction on two data sets, which can be found under lab 6 module on D2L. Save them in your working directory. 1. adult.csv: This dataset contains census data about more than 48,000 individuals. Try to predict whether an individual’s income exceeds $50K/yr based on census data, such as age, work class, education, race, sex, marital status, country etc. You can find the detail about the dataset at: https:archive.ics.uci.edu/ml/datasets/Adult2. titanic.csv: This dataset contains variables like class, age, and sex, to figure out if a person survived the wreck of titanic. It has been used in previous lectures.2 PackagesFor lab 6, we will use 2 packages to manipulate data. C50: This model extends the C4.5 classification algorithms described in Quinlan XXXXXXXXXXThe details of the extensions are largely undocumented. The model can take the form of a full decision tree. pROC: Tools for visualizing, smoothing, and comparing receiver operating characteristic (ROC curves).#  Install packages install.packages("C50")install.packages("pROC")liary(C50)liary(pROC)3 Precision and RecallFirst, use setwd() to assign your working directory. Save adult.csv under the directory. Then load adult dataset into the memory, in which question mark stands for missing value.  Due to the built-in function of C50 package, we don't have to preprocess missing value.#   Read in csv file groceries.csv. adult Split data for training and testing#   Partition dataset for training (80%) and testing (20%)size ###   Randomly decide which ones for trainingtraining_index train test ###   Names of variables that used for predictionvar_names Fit decision tree model. You can find a ranked list of attributes in term of usage via method summary(dt). # Fit the modeldt #   See the summary of modelsummary(dt)###   Now, validate test##    predict() method returns a vector of resultdt_pred ###   Merger dt_prediction value to test datasetdt_evaluation Have a simple feel of prediction### Compare dt_prediction result to actual valuedt_evaluation$coect ###    Accuracy ratesum(dt_evaluation$coect) / nrow(dt_evaluation)###   Confusion matrixtable(dt_evaluation$if_above_50K, dt_evaluation$dt_pred)## XXXXXXXXXXno  yes##      no XXXXXXXXXX##      yes XXXXXXXXXXIn general, we have four metrics to evaluate prediction. TPR, TNR, FPR, and FNR. ###   True Positive Rate (Sensitivity)  TPR = TP / P###   = count of true positive dt_prediction divided by total positive truthTPR ###   True Negative Rate (Specificity)  TNR = TN / N###   = count of true negative dt_prediction divided by total negative truthTNR ###   False Positive Rate  (1 - Spec)  FPR = FP / N###   = count of false positive dt_prediction divided by total negative truth###   = sum(dt_evaluation$dt_pred == 'yes'& dt_evaluation$if_above_50K == 'no' )/ ###   sum(dt_evaluation$if_above_50K == 'no')FPR ###   False Negative Rate FNR   FNR = FN / P###   = count of false negative dt_prediction divided by total positive truth###   = sum(dt_evaluation$dt_pred == 'no'& dt_evaluation$if_above_50K == 'yes' )/ ###   sum(dt_evaluation$if_above_50K == 'yes')FNR Precision and Recall are widely used to evaluate prediction performance. ###   dt_precision equals ###   = number of true positive dt_prediction  / total positive dt_predictiondt_precision ###   dt_recall equals = TPR###   = true positive dt_prediction / total true positivedt_recall F score is a metric that combines precision and recall is the harmonic mean of precision and recall. In some cases, we have to adjust weight of precision or recall due to domain knowledge.###  F measureF 4 ROC Curve: Receiver Operating Characteristic CurveLoad the second dataset, titanic.csv. Partition data into training and testing as we did above.titanic ###   Partition dataset for training (80%) and testing (20%)size ###  Randomly decide which ones for trainingtraining_index train test Fit logistic regression. Note parameter type = response in predict method. It returns risk rate instead of classification.###   Fitting regression modeleg ###  Model detailsummary(reg)###   Validate test datasetevaluation evaluation$prob See the improvement compared to baseline in dataset#   Baseline   = 32%count_survive aseline aseline## XXXXXXXXXXPlot ROC curve Note the AUC is 0.7686, significantly higher than average threshold. Since training set and testing set are randomly sampled, this number may be different on your computer.#  Feed Sensitivity & Specificity to roc()g #   ROC curveplot(g)##   Area under the curve: 0.76862adult.csvage,workclass,fnlwgt,education,education-num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,if_above_50K39,State-gov,77516,Bachelors,13,Never-maied,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,no50,Self-emp-not-inc,83311,Bachelors,13,Maied-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,no38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,no53,Private,234721,11th,7,Maied-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,no28,Private,338409,Bachelors,13,Maied-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,no37,Private,284582,Masters,14,Maied-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,no49,Private,160187,9th,5,Maied-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,no52,Self-emp-not-inc,209642,HS-grad,9,Maied-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,yes31,Private,45781,Masters,14,Never-maied,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,yes42,Private,159449,Bachelors,13,Maied-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,yes37,Private,280464,Some-college,10,Maied-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,yes30,State-gov,141297,Bachelors,13,Maied-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,yes23,Private,122272,Bachelors,13,Never-maied,Adm-clerical,Own-child,White,Female,0,0,30,United-States,no32,Private,205019,Assoc-acdm,12,Never-maied,Sales,Not-in-family,Black,Male,0,0,50,United-States,no40,Private,121772,Assoc-voc,11,Maied-civ-spouse,Craft-repair,Husband,Asian-Pac-Islander,Male,0,0,40,?,yes34,Private,245487,7th-8th,4,Maied-civ-spouse,Transport-moving,Husband,Amer-Indian-Eskimo,Male,0,0,45,Mexico,no25,Self-emp-not-inc,176756,HS-grad,9,Never-maied,Farming-fishing,Own-child,White,Male,0,0,35,United-States,no32,Private,186824,HS-grad,9,Never-maied,Machine-op-inspct,Unmaied,White,Male,0,0,40,United-States,no38,Private,28887,11th,7,Maied-civ-spouse,Sales,Husband,White,Male,0,0,50,United-States,no43,Self-emp-not-inc,292175,Masters,14,Divorced,Exec-managerial,Unmaied,White,Female,0,0,45,United-States,yes40,Private,193524,Doctorate,16,Maied-civ-spouse,Prof-specialty,Husband,White,Male,0,0,60,United-States,yes54,Private,302146,HS-grad,9,Separated,Other-service,Unmaied,Black,Female,0,0,20,United-States,no35,Federal-gov,76845,9th,5,Maied-civ-spouse,Farming-fishing,Husband,Black,Male,0,0,40,United-States,no43,Private,117037,11th,7,Maied-civ-spouse,Transport-moving,Husband,White,Male,0,2042,40,United-States,no59,Private,109015,HS-grad,9,Divorced,Tech-support,Unmaied,White,Female,0,0,40,United-States,no56,Local-gov,216851,Bachelors,13,Maied-civ-spouse,Tech-support,Husband,White,Male,0,0,40,United-States,yes19,Private,168294,HS-grad,9,Never-maied,Craft-repair,Own-child,White,Male,0,0,40,United-States,no54,?,180211,Some-college,10,Maied-civ-spouse,?,Husband,Asian-Pac-Islander,Male,0,0,60,South,yes39,Private,367260,HS-grad,9,Divorced,Exec-managerial,Not-in-family,White,Male,0,0,80,United-States,no49,Private,193366,HS-grad,9,Maied-civ-spouse,Craft-repair,Husband,White,Male,0,0,40,United-States,no23,Local-gov,190709,Assoc-acdm,12,Never-maied,Protective-serv,Not-in-family,White,Male,0,0,52,United-States,no20,Private,266015,Some-college,10,Never-maied,Sales,Own-child,Black,Male,0,0,44,United-States,no45,Private,386940,Bachelors,13,Divorced,Exec-managerial,Own-child,White,Male,0,1408,40,United-States,no30,Federal-gov,59951,Some-college,10,Maied-civ-spouse,Adm-clerical,Own-child,White,Male,0,0,40,United-States,no22,State-gov,311512,Some-college,10,Maied-civ-spouse,Other-service,Husband,Black,Male,0,0,15,United-States,no48,Private,242406,11th,7,Never-maied,Machine-op-inspct,Unmaied,White,Male,0,0,40,Puerto-Rico,no21,Private,197200,Some-college,10,Never-maied,Machine-op-inspct,Own-child,White,Male,0,0,40,United-States,no19,Private,544091,HS-grad,9,Maied-AF-spouse,Adm-clerical,Wife,White,Female,0,0,25,United-States,no31,Private,84154,Some-college,10,Maied-civ-spouse,Sales,Husband,White,Male,0,0,38,?,yes48,Self-emp-not-inc,265477,Assoc-acdm,12,Maied-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,no31,Private,507875,9th,5,Maied-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,43,United-States,no53,Self-emp-not-inc,88506,Bachelors,13,Maied-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,no24,Private,172987,Bachelors,13,Maied-civ-spouse,Tech-support,Husband,White,Male,0,0,50,United-States,no49,Private,94638,HS-grad,9,Separated,Adm-clerical,Unmaied,White,Female,0,0,40,United-States,no25,Private,289980,HS-grad,9,Never-maied,Handlers-cleaners,Not-in-family,White,Male,0,0,35,United-States,no57,Federal-gov,337895,Bachelors,13,Maied-civ-spouse,Prof-specialty,Husband,Black,Male,0,0,40,United-States,yes53,Private,144361,HS-grad,9,Maied-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,38,United-States,no44,Private,128354,Masters,14,Divorced,Exec-managerial,Unmaied,White,Female,0,0,40,United-States,no41,State-gov,101603,Assoc-voc,11,Maied-civ-spouse,Craft-repair,Husband,White,Male,0,0,40,United-States,no29,Private,271466,Assoc-voc,11,Never-maied,Prof-specialty,Not-in-family,White,Male,0,0,43,United-States,no25,Private,32275,Some-college,10,Maied-civ-spouse,Exec-managerial,Wife,Other,Female,0,0,40,United-States,no18,Private,226956,HS-grad,9,Never-maied,Other-service,Own-child,White,Female,0,0,30,?,no47,Private,51835,Prof-school,15,Maied-civ-spouse,Prof-specialty,Wife,White,Female,0,1902,60,Honduras,yes50,Federal-gov,251585,Bachelors,13,Divorced,Exec-managerial,Not-in-family,White,Male,0,0,55,United-States,yes47,Self-emp-inc,109832,HS-grad,9,Divorced,Exec-managerial,Not-in-family,White,Male,0,0,60,United-States,no43,Private,237993,Some-college,10,Maied-civ-spouse,Tech-support,Husband,White,Male,0,0,40,United-States,yes46,Private,216666

Mohd · Accepted Answer

Assignment
Assignment
Walker Kirk
8/23/2021
knitr::opts_chunk$set(echo = TRUE,cache = TRUE,warning = FALSE,message = FALSE,dpi = 180,fig.width = 8,fig.height = 5)
library(dplyr)
library(ggplot2)
library(magrittr)
library(rmarkdown)
library(C50)
library(pROC)
Please finish the questions below using R: 1. Fit Decision Tree and Logistic Regression to predict affairs (Attribute if_affair is the dependent/target variable).
1. Base on the result of Decision Tree:
1. Find the most useful attribute in prediction. (Hint: use summary(your model))
1. What is the Precision and Recall?

Lab 6_Model Evaluation.docx MIS 545 Lab 6: Model Evaluation 1 Overview In this lab, we will examine the performance of prediction on two data sets, which can be found under lab 6 module on D2L. Save...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment