IntroductionFor this week’s take-home lab, you will work on the same data set from Week 4/5...

Question

IntroductionFor this week’s take-home lab, you will work on the same data set from Week 4/5 Take-Home Labs. You will solve the very same problem studied in this week’s in-class lab on a much larger and more interesting dataset. The data contained in the file UCI_Credit_Card.csv contains 30,000 consumer records with 24 different variables. You can read a detailed description of the different fields at the following website:https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clientsThe description from the UCI says marriage should have levels: Marital status (1 = married; 2 = single; 3 = others) However, there are levels (0,1,2,3). You should treat 0 as unknown. the description from the UCI says Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). However, there are levels 1 to 6 for education. Thus here 5 = 6 = unknown. X6-X11: The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. However, there are many factors that are -2. This is also unknown. So every unknown you should treat them as NA.Your task is to build the best possible model for predicting whether or not a consumer will default on their credit card payment for the next month (the last column in the dataset).AssignmentPerform the following tasks:Conduct a training/test split of the data, building a 20% held out test datasetFit the best KNN model and CART model you can (consider feature selection etc.) to the data to predict consumer default.Then plot ROC curves for the logistic regression, SVM, KNN, and CART models, and compare their performance.Compute the AUC for the logistic regression, SVM, KNN, and CART models, and compare their performance.Provide a summary and discussion of your work in written form (.docx or .pdf) that includes the following:Q1 Summarize the model/feature selection process you used to fit your KNN and CART modelQ2 Provide a summary of the fitted KNN/CART models (i.e.model summary)Q3 Provide performance evaluation of the fitted KNN/CART models using confusion matrix.Q4 How well do you think the fitted KNN/CART models to this dataset works?Q5 Using ROC curves and AUC, which one of logistic regression, SVM, KNN, and CART models works better with the dataset so far?Submission InstructionsFor this weekly lab assignment, you should submit:An R script file (or Rmd file)A written summary/discussion of your work (as discussed above) in .docx

Mohd · Accepted Answer

-
-
-
2/18/2022
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(caret)
## Warning: package 'caret' was built under R version 4.1.1
## Loading required package: ggplot2
## Loading required package: lattice
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(e1071)
## Warning: package 'e1071' was built under R version 4.1.

Introduction For this week’s take-home lab, you will work on the same data set from Week 4/5 Take-Home Labs. You will solve the very same problem studied in this week’s in-class lab on a much larger...

Introduction

Assignment

Submission Instructions

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment