Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Machine Learning, 2022S: HW6 CS XXXXXXXXXXMachine Learning: Homework 6 SpRing 2022 Due: Sunday, May 1, 2022 (End of day) We are coming back to the dataset that we used in homework 5. Namely, we will...

1 answer below »

Machine Learning, 2022S: HW6
CS XXXXXXXXXXMachine Learning: Homework 6
SpRing 2022
Due: Sunday, May 1, 2022 (End of day)
We are coming back to the dataset that we used in homework 5. Namely, we will be using a UCI
simulated electrical grid stability data set that is available here:
https:
archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+.
This dataset has 10,000 examples using 11 real-valued attributes, with a binary target (stable vs.
unstable). The target value that you are predicting is the last column in the dataset.
Remark 1 (Cross-Entropy). For the cross-entropy values that you want to report in the questions
elow, please use the following formula (empirical risk using the cross-entropy loss):
R̂S(h, c) = −
1
m
m∑
i=1
[yi ln(pi) + (1− yi) ln(1− pi)] ,
where
m is the size of the sample S where we evaluate our hypothesis h,
yi ∈ {0, 1} is the true label c(xi) of the instance xi, and
pi is the probability of assigning the positive label to instance xi by our hypothesis.
Exercise 1 – Preprocessing (10 points). You have already done this part in homework 4. How-
ever, since you may need to refresh your memory with what you did, this part is worth a few points.
(a) Remove columns 5 and 13 (labeled p1 and stab); p1 is non-predictive and stab is target column
that is exactly co
elated with the binary target you are trying to predict (if this column is
negative, the system is stable).
(b) Change the target variable to a number. If the value is stable, change it to 1, and if the value
is unstable, change it to 0.
(c) Remove 20% of the examples and keep them for testing. Youmay assume that all examples are
independent, so it does not matter which 20% you remove. However, the testing data should
not be used until after a model has been selected.
(d) Split the remaining examples into training (75%) and validation (25%). Thus, you will train
with 60% of the full dataset (75% of 80%) and validate with 20% of the full dataset (25% of
80%).
April 11, 2022 1/3
https:
archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+
CS XXXXXXXXXXMachine LeaRning: HomewoRK 6
Exercise 2 – Artificial Neural Network (20 points). You may use
sklearn.neural_network.MLPClassifier.
(a) Fit an artificial neural network to the training data using 1 hidden layer of 20 units as well as
another neural network that has 2 hidden layers of 10 units each.
(b) For each model made in (a), make a probabilistic prediction for each validation example. Re-
port the cross-entropies between the predictions and the true labels in your writeup.
(c) Which neural network performs the best on the validation data? Report this in your writeup.
Train a new neural network using the architecture that performed better among the two using
the training and validation data. Make a probabilistic prediction for each testing example
using this model and save them for later.
Exercise 3 – Decision Trees (20 points). For this problem you can use the scikit-learn method
sklearn.tree.DecisionTreeClassifier.
(a) Fit a decision tree to the training data using the Gini impurity index and max tree depth of 5.
(b) Using themodel created in part (a)make a probabilistic prediction for each validation example.
What is the cross-entropy on these predictions and the true labels? Put this value in you
writeup.
(c) Fit a decision tree to the training data using information gain and max tree depth of 5.
(d) Using themodel created in part (c) make a probabilistic prediction for each validation example.
What is the cross-entropy on these predictions and the true labels? Put this value in you
writeup.
(e) Which model performed better on the validation data? Report this in your writeup. Train a
new decision tree on the training and validation data using whichever measure created the
est model in (a)-(d), with a max tree depth of 5. Make a probabilistic prediction for each
testing example and save them for later.
Exercise 4 – Boosting (20 points). For this problem you may use
sklearn.ensemble.AdaBoostClassifier.
(a) Fit boosted decision stumps (max tree depth of 1) to the training data allowing at most 20, 40,
and 80 decision stumps (base estimators) in each model.
(b) For each model trained in (a), make a probabilisitc prediction for each validation example.
Report the cross-entropies between the predictions and the true labels in your writeup.
(c) Which upper bound on the number of allowed base classifiers generates the best performing
model? Report this in your writeup. Train a new AdaBoost classifier using this bound on the
number of maximum allowed base classifiers, using the training and validation data. Make a
probabilistic prediction for each testing example using this model and save them for later.
2/3 April 11, 2022
CS XXXXXXXXXXMachine LeaRning: HomewoRK 6
Exercise 5 – ROC Curve (30 points). For this exercise you must write your own code; no
scikit-learn, except maybe to compute AUC.
For each model produced in Exercises 2-4 do the following:
(a) Determinize the testing predictions made above, using 1001 different probability thresholds
(0.000, 0.001, 0.002, . . ., 0.999, 1.000). “Determinization” means converting the probability to
a deterministic class label (0 or 1). Use (1) below for determinization. We have that p∗ is
the critical threshold; pi is the predicted probability for example i; and Pi is the resulting
deterministic prediction:
Pi =
{
1, if pi ≥ p∗
0, otherwise
(1)
(b) At each of the 1001 probability thresholds, compute the true positive rate (TPR) and false
positive rate (FPR). Recall that these values are easily computed from the confusion matrix.
(You would have to re-calculate the confusionmatrix for each one of these thresholds, for each
model.)
(c) Plot the ROC (receiver operating characteristic) curve, using the 1001 points created in part
6b. If you have forgotten what a ROC curve looks like, see our notes on model evaluation. The
ROC curve must contain a point at the bottom left (0, 0) and top right (1, 1). Also, it must
contain the dashed grey line, indicating the performance of a random predictor. Include the
ROC curve for each model in your write-up.
(d) Find the probability threshold yielding the highest Youden index (TPR - FPR). Report the
Youden index and the co
esponding probability threshold for each model.
(e) Compute the AUC (area under the curve) for each model. You may use the function
sklearn.metrics.roc_auc_score for this part.
April 11, 2022 3/3
Answered 6 days After Apr 23, 2022

Solution

Uhanya answered on Apr 26 2022
98 Votes
AUC-ROC Curve – The Star Performer!
You’ve built your machine learning model – so what’s next? You need to evaluate it and validate how good (or bad) it is, so you can then decide on whether to implement it. That’s where the AUC-ROC curve comes in.
The name might be a mouthful, but it is just saying that we are calculating the “Area Under the Curve” (AUC) of “Receiver Characteristic Operator” (ROC). Confused? I feel you! I have been in your shoes. But don’t wo
y, we will see what these terms mean in detail and everything will be a piece of cake!
For now, just know that the AUC-ROC curve helps us visualize how well our machine learning classifier is performing. Although it works for only binary classification problems, we will see towards the end how we can extend it to evaluate multi-class classification problems too.
We’ll cover topics like sensitivity and specificity as well since these are key topics behind the AUC-ROC curve.
I suggest going through the article on Confusion Matrix as it will introduce some important terms which we will be using in this article.
Become a Full-Stack Data Scientist
Power Ahead in your AI ML Career | No Pre-requisites RequiredDownload Brochure
 
Table of Contents
· What are Sensitivity and Specificity?
· Probability of Predictions
· What is the AUC-ROC Curve?
· How Does the AUC-ROC Curve Work?
· AUC-ROC in Python
· AUC-ROC for Multi-Class Classification
 
What are Sensitivity and Specificity?
This is what a confusion matrix looks like:
From the confusion matrix, we can derive some important metrics that were not discussed in the previous article. Let’s talk about them here.
 
Sensitivity / True Positive Rate / Recall
Sensitivity tells us what proportion of the positive class got co
ectly classified.
A simple example would be to determine what proportion of the actual sick people were co
ectly detected by the model.
 
False Negative Rate
False Negative Rate (FNR) tells us what proportion of the positive class got inco
ectly classified by the classifier.
A higher TPR and a lower FNR is desirable since we want to co
ectly classify the positive class.
 
Specificity / True Negative Rate
Specificity tells us what proportion of the negative class got co
ectly classified.
Taking the same example as in Sensitivity, Specificity would mean determining the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here