Jill commends you for all your hard work. Piece by piece, you’ve been building up your skills in data preparation, statistical reasoning, and machine learning. You are now ready to apply machine learning to solve a real-world challenge: credit card risk.

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, you’ll need to employ different techniques to train and evaluate models with unbalanced classes. Jill asks you to use`imbalanced-learn`

and`scikit-learn`

libraries to build and evaluate models using resampling.

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, you’ll oversample the data using the`RandomOverSampler`

and`SMOTE`

algorithms, and undersample the data using the`ClusterCentroids`

algorithm. Then, you’ll use a combinatorial approach of over- and undersampling using the`SMOTEENN`

algorithm. Next, you’ll compare two new machine learning models that reduce bias,`BalancedRandomForestClassifier`

and`EasyEnsembleClassifier`

, to predict credit risk. Once you’re done, you’ll evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

This new assignment consists of three technical analysis deliverables and a written report. You will submit the following:

- Deliverable 1: Use Resampling Models to Predict Credit Risk
- Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk
- Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
- Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)

Use the following link to download theModule-17-Challenge-Resources.zip(Links to an external site.)file that includes the`LoanStats_2019Q1.csv`

dataset and two starter code files:`credit_risk_resampling_starter_code.ipynb`

and`credit_risk_ensemble_starter_code.ipynb`

.

Create a new GitHub repository entitled "Credit_Risk_Analysis" and initialize the repository with a README.

Using your knowledge of the`imbalanced-learn`

and`scikit-learn`

libraries, you’ll evaluate three machine learning models by using resampling to determine which is better at predicting credit risk. First, you’ll use the oversampling`RandomOverSampler`

and`SMOTE`

algorithms, and then you’ll use the undersampling`ClusterCentroids`

algorithm. Using these algorithms, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

For this deliverable, you’ve already done the following in this module:

**Lesson 17.2.3:**Split the data into training and testing sets**Lesson 17.3.1:**Perform logistic regression**Lesson 17.4.1:**Calculate accuracy, precision, and sensitivity**Lesson 17.4.2:**Create a confusion matrix**Lesson XXXXXXXXXX:**Use the`RandomOverSampler`

and`SMOTE`

algorithms to resample a dataset**Lesson XXXXXXXXXX:**Use the`ClusterCentroids`

algorithm to resample a dataset

Follow the instructions below and use the`credit_risk_resampling_starter_code.ipynb`

file to complete Deliverable 1.

Open the`credit_risk_resampling_starter_code.ipynb`

file, rename it`credit_risk_resampling.ipynb`

, and save it to your Credit_Risk_Analysis folder.

Using the information we’ve provided in the starter code, create your training and target variables by completing the following steps:

- Create the training variables by converting the string values into numerical ones using the
`get_dummies()`

method. - Create the target variables.
- Check the balance of the target variables.

Next, begin resampling the training data. First, use the oversampling`RandomOverSampler`

and`SMOTE`

algorithms to resample the data, then use the undersampling`ClusterCentroids`

algorithm to resample the data. For each resampling algorithm, do the following:

- Use the
`LogisticRegression`

classifier to make predictions and evaluate the model’s performance. - Calculate the accuracy score of the model.
- Generate a confusion matrix.
- Print out the imbalanced classification report.

Save your`credit_risk_resampling.ipynb`

file to your Credit_Risk_Analysis folder.

You will earn a perfect score for Deliverable 1 by completing all requirements below:

- For all three algorithms, the following have been completed:
- An accuracy score for the model is calculated
**(7.5 pt)** - A confusion matrix has been generated
**(7.5 pt)** - An imbalanced classification report has been generated
**(15 pt)**

- An accuracy score for the model is calculated

Using your knowledge of the`imbalanced-learn`

and`scikit-learn`

libraries, you’ll use a combinatorial approach of over- and undersampling with the`SMOTEENN`

algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms from Deliverable 1. Using the`SMOTEENN`

algorithm, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

For this deliverable, you’ve already done the following in this module:

**Lesson 17.3.1:**Perform logistic regression**Lesson 17.4.1:**Calculate accuracy, precision, and sensitivity**Lesson 17.4.2:**Create a confusion matrix**Lesson XXXXXXXXXX:**Use the`SMOTEENN`

algorithm to resample a dataset

Follow the instructions below and use the information in the`credit_risk_resampling_starter_code.ipynb`

file to complete Deliverable 2.

- Continue using your
`credit_risk_resampling.ipynb`

file where you have already created your training and target variables. - Using the information we have provided in the starter code, resample the training data using the
`SMOTEENN`

algorithm. - After the data is resampled, use the
`LogisticRegression`

classifier to make predictions and evaluate the model’s performance. - Calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.

Save your`credit_risk_resampling.ipynb`

file to your Credit_Risk_Analysis folder.

You will earn a perfect score for Deliverable 2 by completing all requirements below:

- The combinatorial
`SMOTEENN`

algorithm does the following:- An accuracy score for the model is calculated
**(5 pt)** - A confusion matrix has been generated
**(5 pt)** - An imbalanced classification report has been generated
**(5 pt)**

- An accuracy score for the model is calculated

Using your knowledge of the`imblearn.ensemble`

library, you’ll train and compare two different ensemble classifiers,`BalancedRandomForestClassifier`

and`EasyEnsembleClassifier`

, to predict credit risk and evaluate each model. Using both algorithms, you’ll resample the dataset, view the count of the target classes, train the ensemble classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

For this deliverable, you’ve already done the following in this module:

**Lesson 17.2.3:**Split the data into training and testing sets**Lesson 17.3.1:**Perform logistic regression**Lesson 17.4.1:**Calculate accuracy, precision, and sensitivity**Lesson 17.4.2:**Create a confusion matrix**Lesson 17.9.2:**Understand adaptive boosting

Follow the instructions below and use the information in the`credit_risk_resampling_starter_code.ipynb`

file to complete Deliverable 3.

- Open the
`credit_risk_ensemble_starter_code.ipynb`

file, rename it`credit_risk_ensemble.ipynb`

, and save it to your Credit_Risk_Analysis folder. - Using the information we have provided in the starter code, create your training and target variables by completing the following:
- Create the training variables by converting the string values into numerical ones using the
`get_dummies()`

method. - Create the target variables.
- Check the balance of the target variables.

- Create the training variables by converting the string values into numerical ones using the
- Resample the training data using the
`BalancedRandomForestClassifier`

algorithm with 100 estimators.- Consult the followingRandom Forest documentation(Links to an external site.)for an example.

- After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.
- Print the feature importance sorted in descending order (from most to least important feature), along with the feature score.
- Next, resample the training data using the
`EasyEnsembleClassifier`

algorithm with 100 estimators.- Consult the followingEasy Ensemble documentation(Links to an external site.)for an example.

- After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.

Save your`credit_risk_ensemble.ipynb`

file to your Credit_Risk_Analysis folder.

You will earn a perfect score for Deliverable 3 by completing all requirements below:

- The
`BalancedRandomForestClassifier`

algorithm does the following:- An accuracy score for the model is calculated
**(2.5 pt)** - A confusion matrix has been generated
**(2.5 pt)** - An imbalanced classification report has been generated
**(5 pt)** - The features are sorted in descending order by feature importance
**(5 pt)**

- An accuracy score for the model is calculated
- The
`EasyEnsembleClassifier`

algorithm does the following:- An accuracy score of the model is calculated
**(2.5 pt)** - A confusion matrix has been generated
**(2.5 pt)** - An imbalanced classification report has been generated
**(5 pt)**

- An accuracy score of the model is calculated

For this deliverable, you’ll write a brief summary and analysis of the performance of all the machine learning models used in this Challenge.

The report should contain the following:

**Overview of the analysis:**Explain the purpose of this analysis.**Results:**Using bulleted lists, describe the balanced accuracy scores and the precision and recall scores of all six machine learning models. Use screenshots of your outputs to support your results.**Summary:**Summarize the results of the machine learning models, and include a recommendation on the model to use, if any. If you do not recommend any of the models, justify your reasoning.

The written analysis has the following structure, organization, and formatting:

- There is a title, and there are multiple sections
**(2 pt)** - Each section has a heading and subheading
**(2 pt)** - Links to images are working, and code is formatted and displayed correctly
**(2 pt)**.

The written analysis has the following:

Overview of the loan prediction risk analysis:

- The purpose of this analysis is well defined
**(4 pt)**

- The purpose of this analysis is well defined
Results:

- There is a bulleted list that describes the balanced accuracy score and the precision and recall scores of all six machine learning models
**(15 pt)**

- There is a bulleted list that describes the balanced accuracy score and the precision and recall scores of all six machine learning models
Summary:

- There is a summary of the results
**(2 pt)** - There is a recommendation on which model to use, or there is no recommendation with a justification
**(3 pt)**

- There is a summary of the results

Once you’re ready to submit, make sure to check your work against the rubric to ensure you are meeting the requirements for this Challenge one final time. It’s easy to overlook items when you’re in the zone!

As a reminder, the deliverables for this Challenge are as follows:

- Deliverable 1: Use Resampling Models to Predict Credit Risk
- Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk
- Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
- Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)

Upload the following to your Credit_Risk_Analysis GitHub repository:

- Your
`credit_risk_resampling.ipynb`

file. - Your
`credit_risk_ensemble.ipynb`

file. - An updated README.md that has your written analysis.

To submit your challenge assignment for grading in Bootcamp Spot, click Start Assignment, click the Website URL tab, then provide the URL of your Credit_Risk_Analysis GitHub repository, and then click Submit. Comments are disabled for graded submissions in BootCampSpot. If you have questions about your feedback, please notify your instructional staff or the Student Success Manager. If you would like to resubmit your work for an improved grade, you can use the**Re-Submit Assignment**button to upload new links. You may resubmit up to 3 times for a total of 4 submissions.

Once you receive feedback on your Challenge, make any suggested updates or adjustments to your work. Then, add this week’s Challenge to your professional portfolio.

You are allowed to miss up to two Challenge assignments and still earn your certificate. If you complete all Challenge assignments, your lowest two grades will be dropped. If you wish to skip this assignment, click Next, and move on to the next Module.

Criteria | Ratings | Pts | |||||
---|---|---|---|---|---|---|---|

This criterion is linked to a learning outcomeDeliverable 1: Use Resampling Models to Predict Loan Risk |
| 30pts | |||||

This criterion is linked to a learning outcomeDeliverable 2: Use the SMOTEENN Algorithm to Predict Loan Risk |
| 15pts | |||||

This criterion is linked to a learning outcomeDeliverable 3: Use Ensemble Classifiers to Predict Loan Risk |
| 25pts | |||||

This criterion is linked to a learning outcomeDeliverable 4: Structure, Organization, and Formatting |
| 6pts | |||||

This criterion is linked to a learning outcomeDeliverable 4: Analysis |
| 24pts | |||||

Total points:100 |

Answered 2 days AfterMar 02, 2022

SOLUTION.PDF## Answer To This Question Is Available To Download

- Bootcamp: UTOR-VIRT-DATA-PT XXXXXXXXXXU-B-TTHAssignmentsModule 19 ChallengeOct-2021HomeNavigatorModulesSyllabusGradesZoomAttendanceCareer ServicesCareer EventsStudent SupportBillingCareer EventsModule...SolvedMar 19, 2022
- 3/9/22, 11:46 PM Module 18 Challengehttps://courses.bootcampspot.com/courses/967/assignments/19043?module_item_id= XXXXXXXXXX/19Module 18 ChallengeDue Sunday by 23:59 Points 100 Submitting a...SolvedMar 10, 2022
- BackgroundSince your work with Jennifer on the SellBy project was so successful, you’ve been tasked with another, larger project: analyzing Amazon reviews written by members of the paid Amazon Vine...SolvedMar 02, 2022
- HR Metrics: aSSIGNment 2 (15%)PROJECT SUMMARYThis is an individual project.You are to create these reports based on the dataset within the Assignment 2 folder:1. a corporate profile...SolvedFeb 10, 2022
- BackgroundBasil and Sadhana like how you created your earthquake map with two different maps and the earthquake overlay. Now, Basil and Sadhana would like to see the earthquake data in relation to the...SolvedFeb 02, 2022
- 1/25/22, 12:41 PM Module 12 Challengehttps://courses.bootcampspot.com/courses/967/assignments/19027?module_item_id= XXXXXXXXXX/19Module 12 ChallengeDue 30 Jan by 23:59 Points 100 Submitting a...SolvedJan 25, 2022
- Consider the following similar problems to be solved in MapReduce paradigm. We have two text files as follows: File1: Every vaccinated person has good protection against the virus. A vaccinated person...SolvedJan 25, 2022
- A1: Text Adventure Game DevelopmentDue Thursday by 10pm Points 100 Submitting a file upload File Types ipynbAvailable until Jan 27 at 10pmStart AssignmentText Adventure Game...SolvedJan 24, 2022
- 1/18/22, 11:15 AM Module 11 Challengehttps://courses.bootcampspot.com/courses/967/assignments/19023?module_item_id= XXXXXXXXXX/11Module 11 ChallengeDue 23 Jan by 23:59 Points 100 Submitting a...SolvedJan 18, 2022
- 1/12/22, 12:11 PM Module 10 Challengehttps://courses.bootcampspot.com/courses/967/assignments/19021?module_item_id= XXXXXXXXXX/15Module 10 ChallengeDue 16 Jan by 23:59 Points 100 Submitting a...SolvedJan 12, 2022

Copy and Paste Your Assignment Here

Disclaimer: The reference papers provided by TAE serve as model papers for students and are not to be submitted as it is. These papers are intended to be used for research and reference purposes only.

Copyright © 2023. All rights reserved.