Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Background Jill commends you for all your hard work. Piece by piece, you’ve been building up your skills in data preparation, statistical reasoning, and machine learning. You are now ready to apply...

1 answer below »

Background

Jill commends you for all your hard work. Piece by piece, you’ve been building up your skills in data preparation, statistical reasoning, and machine learning. You are now ready to apply machine learning to solve a real-world challenge: credit card risk.

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, you’ll need to employ different techniques to train and evaluate models with unbalanced classes. Jill asks you to useimbalanced-learnandscikit-learnlibraries to build and evaluate models using resampling.

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, you’ll oversample the data using theRandomOverSamplerandSMOTEalgorithms, and undersample the data using theClusterCentroidsalgorithm. Then, you’ll use a combinatorial approach of over- and undersampling using theSMOTEENNalgorithm. Next, you’ll compare two new machine learning models that reduce bias,BalancedRandomForestClassifierandEasyEnsembleClassifier, to predict credit risk. Once you’re done, you’ll evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

What You're Creating

This new assignment consists of three technical analysis deliverables and a written report. You will submit the following:

  • Deliverable 1: Use Resampling Models to Predict Credit Risk
  • Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk
  • Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
  • Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)

Files

Use the following link to download theModule-17-Challenge-Resources.zip(Links to an external site.)file that includes theLoanStats_2019Q1.csvdataset and two starter code files:credit_risk_resampling_starter_code.ipynbandcredit_risk_ensemble_starter_code.ipynb.

Before You Start

Create a new GitHub repository entitled "Credit_Risk_Analysis" and initialize the repository with a README.

Deliverable 1: Use Resampling Models to Predict Credit Risk (30 points)

Deliverable 1 Instructions

Using your knowledge of theimbalanced-learnandscikit-learnlibraries, you’ll evaluate three machine learning models by using resampling to determine which is better at predicting credit risk. First, you’ll use the oversamplingRandomOverSamplerandSMOTEalgorithms, and then you’ll use the undersamplingClusterCentroidsalgorithm. Using these algorithms, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

REWIND

For this deliverable, you’ve already done the following in this module:

  • Lesson 17.2.3:Split the data into training and testing sets
  • Lesson 17.3.1:Perform logistic regression
  • Lesson 17.4.1:Calculate accuracy, precision, and sensitivity
  • Lesson 17.4.2:Create a confusion matrix
  • Lesson XXXXXXXXXX:Use theRandomOverSamplerandSMOTEalgorithms to resample a dataset
  • Lesson XXXXXXXXXX:Use theClusterCentroidsalgorithm to resample a dataset

Follow the instructions below and use thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 1.

Open thecredit_risk_resampling_starter_code.ipynbfile, rename itcredit_risk_resampling.ipynb, and save it to your Credit_Risk_Analysis folder.

Using the information we’ve provided in the starter code, create your training and target variables by completing the following steps:

  • Create the training variables by converting the string values into numerical ones using theget_dummies()method.
  • Create the target variables.
  • Check the balance of the target variables.

Next, begin resampling the training data. First, use the oversamplingRandomOverSamplerandSMOTEalgorithms to resample the data, then use the undersamplingClusterCentroidsalgorithm to resample the data. For each resampling algorithm, do the following:

  • Use theLogisticRegressionclassifier to make predictions and evaluate the model’s performance.
  • Calculate the accuracy score of the model.
  • Generate a confusion matrix.
  • Print out the imbalanced classification report.

Save yourcredit_risk_resampling.ipynbfile to your Credit_Risk_Analysis folder.

Deliverable 1 Requirements

You will earn a perfect score for Deliverable 1 by completing all requirements below:

  • For all three algorithms, the following have been completed:
    • An accuracy score for the model is calculated(7.5 pt)
    • A confusion matrix has been generated(7.5 pt)
    • An imbalanced classification report has been generated(15 pt)

Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk (15 points)

Deliverable 2 Instructions

Using your knowledge of theimbalanced-learnandscikit-learnlibraries, you’ll use a combinatorial approach of over- and undersampling with theSMOTEENNalgorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms from Deliverable 1. Using theSMOTEENNalgorithm, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

REWIND

For this deliverable, you’ve already done the following in this module:

  • Lesson 17.3.1:Perform logistic regression
  • Lesson 17.4.1:Calculate accuracy, precision, and sensitivity
  • Lesson 17.4.2:Create a confusion matrix
  • Lesson XXXXXXXXXX:Use theSMOTEENNalgorithm to resample a dataset

Follow the instructions below and use the information in thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 2.

  1. Continue using yourcredit_risk_resampling.ipynbfile where you have already created your training and target variables.
  2. Using the information we have provided in the starter code, resample the training data using theSMOTEENNalgorithm.
  3. After the data is resampled, use theLogisticRegressionclassifier to make predictions and evaluate the model’s performance.
  4. Calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.

Save yourcredit_risk_resampling.ipynbfile to your Credit_Risk_Analysis folder.

Deliverable 2 Requirements

You will earn a perfect score for Deliverable 2 by completing all requirements below:

  • The combinatorialSMOTEENNalgorithm does the following:
    • An accuracy score for the model is calculated(5 pt)
    • A confusion matrix has been generated(5 pt)
    • An imbalanced classification report has been generated(5 pt)

Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk (25 points)

Deliverable 3 Instructions

Using your knowledge of theimblearn.ensemblelibrary, you’ll train and compare two different ensemble classifiers,BalancedRandomForestClassifierandEasyEnsembleClassifier, to predict credit risk and evaluate each model. Using both algorithms, you’ll resample the dataset, view the count of the target classes, train the ensemble classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.

REWIND

For this deliverable, you’ve already done the following in this module:

  • Lesson 17.2.3:Split the data into training and testing sets
  • Lesson 17.3.1:Perform logistic regression
  • Lesson 17.4.1:Calculate accuracy, precision, and sensitivity
  • Lesson 17.4.2:Create a confusion matrix
  • Lesson 17.9.2:Understand adaptive boosting

Follow the instructions below and use the information in thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 3.

  1. Open thecredit_risk_ensemble_starter_code.ipynbfile, rename itcredit_risk_ensemble.ipynb, and save it to your Credit_Risk_Analysis folder.
  2. Using the information we have provided in the starter code, create your training and target variables by completing the following:
    • Create the training variables by converting the string values into numerical ones using theget_dummies()method.
    • Create the target variables.
    • Check the balance of the target variables.
  3. Resample the training data using theBalancedRandomForestClassifieralgorithm with 100 estimators.
    • Consult the followingRandom Forest documentation(Links to an external site.)for an example.
  4. After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.
  5. Print the feature importance sorted in descending order (from most to least important feature), along with the feature score.
  6. Next, resample the training data using theEasyEnsembleClassifieralgorithm with 100 estimators.
    • Consult the followingEasy Ensemble documentation(Links to an external site.)for an example.
  7. After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.

Save yourcredit_risk_ensemble.ipynbfile to your Credit_Risk_Analysis folder.

Deliverable 3 Requirements

You will earn a perfect score for Deliverable 3 by completing all requirements below:

  • TheBalancedRandomForestClassifieralgorithm does the following:
    • An accuracy score for the model is calculated(2.5 pt)
    • A confusion matrix has been generated(2.5 pt)
    • An imbalanced classification report has been generated(5 pt)
    • The features are sorted in descending order by feature importance(5 pt)
  • TheEasyEnsembleClassifieralgorithm does the following:
    • An accuracy score of the model is calculated(2.5 pt)
    • A confusion matrix has been generated(2.5 pt)
    • An imbalanced classification report has been generated(5 pt)

Deliverable 4: Written Report on the Credit Risk Analysis (30 points)

Deliverable 4 Instructions

For this deliverable, you’ll write a brief summary and analysis of the performance of all the machine learning models used in this Challenge.

The report should contain the following:

  1. Overview of the analysis:Explain the purpose of this analysis.

  2. Results:Using bulleted lists, describe the balanced accuracy scores and the precision and recall scores of all six machine learning models. Use screenshots of your outputs to support your results.

  3. Summary:Summarize the results of the machine learning models, and include a recommendation on the model to use, if any. If you do not recommend any of the models, justify your reasoning.

Deliverable 4 Requirements

Structure, Organization, and Formatting (6 points)

The written analysis has the following structure, organization, and formatting:

  • There is a title, and there are multiple sections(2 pt)
  • Each section has a heading and subheading(2 pt)
  • Links to images are working, and code is formatted and displayed correctly(2 pt).

Analysis (24 points)

The written analysis has the following:

  • Overview of the loan prediction risk analysis:

    • The purpose of this analysis is well defined(4 pt)
  • Results:

    • There is a bulleted list that describes the balanced accuracy score and the precision and recall scores of all six machine learning models(15 pt)
  • Summary:

    • There is a summary of the results(2 pt)
    • There is a recommendation on which model to use, or there is no recommendation with a justification(3 pt)

Submission

Once you’re ready to submit, make sure to check your work against the rubric to ensure you are meeting the requirements for this Challenge one final time. It’s easy to overlook items when you’re in the zone!

As a reminder, the deliverables for this Challenge are as follows:

  • Deliverable 1: Use Resampling Models to Predict Credit Risk
  • Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk
  • Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
  • Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)

Upload the following to your Credit_Risk_Analysis GitHub repository:

  • Yourcredit_risk_resampling.ipynbfile.
  • Yourcredit_risk_ensemble.ipynbfile.
  • An updated README.md that has your written analysis.

To submit your challenge assignment for grading in Bootcamp Spot, click Start Assignment, click the Website URL tab, then provide the URL of your Credit_Risk_Analysis GitHub repository, and then click Submit. Comments are disabled for graded submissions in BootCampSpot. If you have questions about your feedback, please notify your instructional staff or the Student Success Manager. If you would like to resubmit your work for an improved grade, you can use theRe-Submit Assignmentbutton to upload new links. You may resubmit up to 3 times for a total of 4 submissions.

IMPORTANT

Once you receive feedback on your Challenge, make any suggested updates or adjustments to your work. Then, add this week’s Challenge to your professional portfolio.

NOTE

You are allowed to miss up to two Challenge assignments and still earn your certificate. If you complete all Challenge assignments, your lowest two grades will be dropped. If you wish to skip this assignment, click Next, and move on to the next Module.

Rubric

Module-17 RubricModule-17 Rubric
CriteriaRatingsPts
This criterion is linked to a learning outcomeDeliverable 1: Use Resampling Models to Predict Loan Risk
30to >27.0PtsDemonstrating Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for ALL THREE algorithms.27to >23.0PtsApproaching Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for TWO of THREE algorithms. ✓Code is written to generate a classification report for the third algorithm.23to >19.0PtsDeveloping Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for ONE of THREE algorithms. ✓Code is written to generate a classification report for TWO algorithms, but there are errors.19to >0.0PtsEmerging✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓Code is written to generate a classification report for ONE or more algorithms.0PtsIncomplete
30pts
This criterion is linked to a learning outcomeDeliverable 2: Use the SMOTEENN Algorithm to Predict Loan Risk
15to >13.0PtsDemonstrating Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓A classification report is generated for the SMOTEENN algorithm.13to >12.0PtsApproaching Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm, but there is a minor error.12to >9.0PtsDeveloping Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm.9to >0.0PtsEmerging✓There is an accuracy score for the SMOTEENN algorithm. ✓Code is written to generate a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm.0PtsIncomplete
15pts
This criterion is linked to a learning outcomeDeliverable 3: Use Ensemble Classifiers to Predict Loan Risk
25to >22.0PtsDemonstrating Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for TWO algorithms. ✓The list of features is sorted in descending order by feature importance.22to >18.0PtsApproaching Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for TWO algorithms. ✓The list of features is not sorted in descending order by feature importance.18to >16.0PtsDeveloping Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for ONE of TWO algorithms. ✓Code is written to generate a classification report for the second algorithm. ✓Code is written that lists the features sorted in descending order by feature importance.16to >0.0PtsEmerging✓There is an accuracy score and confusion matrix for TWO algorithms. ✓Code is written to generate a classification report for ONE of TWO algorithms. ✓Code is written that lists the features sorted in descending order by feature importance.0PtsIncomplete
25pts
This criterion is linked to a learning outcomeDeliverable 4: Structure, Organization, and Formatting
6to >5.0PtsDemonstrating ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. ✓Each section has a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly.5to >4.0PtsApproaching ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. ✓Each section has a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly, with one or two minor errors.4to >3.0PtsDeveloping ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. AND ONE of the following: ✓Each section may have a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly, with one or two minor errors.3to >0.0PtsEmergingThe written analysis has ALL of the following: ✓There is a title. ✓There may be a subheading for a section. ✓There are no headings for each section, but there are three sections.0PtsIncomplete
6pts
This criterion is linked to a learning outcomeDeliverable 4: Analysis
24to >20.0PtsDemonstrating Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for ALL SIX algorithms are described. ✓The results are summarized, and there is a recommendation on which model to use or justification.20to >18.0PtsApproaching Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for FIVE of the SIX algorithms are described. ✓The results are summarized, but the recommendation on which model to use or justification is not clear.18to >16.0PtsDeveloping Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for FOUR of the SIX algorithms are described. ✓The results are summarized, but there is no recommendation on which model to use or justification.16to >0.0PtsEmerging✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for THREE of the SIX algorithms are described. ✓The results are summarized, but there is no recommendation on which model to use or justification.0PtsIncomplete
24pts
Total points:100
XXXXXXXXXX: Combination Sampling With SMOTEENN" >Previous Module 17 Career Connection" >Next© XXXXXXXXXXTrilogy Education Services, a 2U, Inc. brand. All Rights Reserved.
Answered 2 days After Mar 02, 2022

Solution

Mohd answered on Mar 05 2022
104 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here