Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

1 CAP 5625: Programming Assignment 2 Due on Canvas by Wednesday, November 3, 2021 at 11:59pm Preliminary instructions You may consult with other students currently taking CAP 5625 at FAU on this...

1 answer below »
1

CAP 5625: Programming Assignment 2

Due on Canvas by Wednesday, November 3, 2021 at 11:59pm

Preliminary instructions
You may consult with other students cu
ently taking CAP 5625 at FAU on this programming
assignment. If you do consult with others, then you must indicate this by providing their names
with your submitted assignment. However, all analyses must be performed independently, all
source code must be written independently, and all students must turn in their own
independent assignment.

Though it should be unnecessary to state in a graduate class, I am reminding you
that you may not turn in code (partial or complete) that is written or inspired by
others, including code from other students, websites, past code that I release
from prior assignments in this class or from past semesters in other classes I
teach, or any other source that would constitute an academic integrity violation.
All instances of academic integrity violations will receive a zero on the
assignment and will be refe
ed to the Department Chair and College Dean for
further administrative action.

You may choose to use whatever programming language you want. However, you must provide
clear instructions on how to compile and/or run your source code. I recommend using a
modern language, such as Python, R, or Matlab as learning these languages can help you if you
were to enter the machine learning or artificial intelligence field in the future.

All analyses performed and algorithms run must be written from scratch. That is, you may not
use a li
ary that can perform coordinate descent, cross validation, elastic net, least squares
egression, optimization, etc. to successfully complete this programing assignment (though you
may reuse your relevant code from Programming Assignment 1). The goal of this assignment is
not to learn how to use particular li
aries of a language, but it is to instead understand how
key methods in statistical machine learning are implemented. With that stated, I will provide
10% extra credit if you additionally implement the assignment using built-in statistical or
machine learning li
aries (see Deliverable 6 at end of the document).

Brief overview of assignment
In this assignment you will still be analyzing the same credit card data from ? = 400 training
observations that you examined in Programming Assignment 1. The goal is to fit a model that
can predict credit balance based on ? = 9 features describing an individual, which include an
individual’s income, credit limit, credit rating, number of credit cards, age, education level,
gender, student status, and ma
iage status. Specifically, you will perform a penalized
(regularized) least squares fit of a linear model using elastic net, with the model parameters
obtained by coordinate descent. Elastic net will permit you to provide simultaneous parameter
shrinkage (tuning parameter ? ≥ 0) and feature selection (tuning parameter ? ∈ [0,1]). The
2

two tuning parameters ? and ? will be chosen using five-fold cross validation, and the best-fit
model parameters will be infe
ed on the training dataset conditional on an optimal pair of
tuning parameters.

Data
Data for these observations are given in Credit_N400_p9.csv, with individuals labeled on
each row (rows 2 through 401), and input features and response given on the columns (with
the first row representing a header for each column). There are six quantitative features, given
y columns labeled “Income”, “limit”, “Rating”, “Cards”, “Age”, and “Education”, and three
qualitative features with two levels labeled “Gender”, “Student”, and “Ma
ied”.

Detailed description of the task
Recall that the task of performing an elastic net fit to training data
{(?1, ?1), (?2, ?2), … , (?? , ??)} is to minimize the cost function
?(?, ?, ?) = ∑(?? −∑?????
?
?=1
)
2
?
?=1
+ ?(?∑??
2
?
?=1
+ (1 − ?)∑|??|
?
?=1
)
where ?? is a centered response and where the input ? features are standardized (i.e., centered
and divided by their standard deviation). Note that we cannot use gradient descent to minimize
this cost function, as the component ∑ |??|
?
?=1 of the penalty is not differentiable. Instead, we
use coordinate descent, where we update each parameter ?, ? = 1,2, … , ?, in turn, keeping all
other parameters constant, and using sub-gradient rather than gradient calculations. To
implement this algorithm, depending on whether your chosen language can quickly compute
vectorized operations, you may implement coordinate descent using either Algorithm 1 or
Algorithm 2 below (choose whichever you are more comfortable implementing). Note that in
languages like R, Python, or Matlab, Algorithm 2 (which would be implemented by several
nested loops) may be much slower than Algorithm 1. Also note that if you are implementing
Algorithm 1 using Python, use numpy a
ays instead of Pandas data frames for computational
speed. For this assignment, assume that we will reach the minimum of the cost function within
a fixed number of steps, with the number of iterations being 1000.


3

Algorithm 1 (vectorized):
Step 1. Fix tuning parameters ? and ?
Step 2. Generate ?-dimensional centered response vector ? and ? × ? standardized
(centered and scaled to have unit standard deviation) design matrix ?
Step 3. Precompute ??, ? = 1,2, … , ?, as
?? =∑???
2
?
?=1

Step 4. Randomly initialize the parameter vector ? = [?1, ?2, … , ??]
Step 5. For each ?, ? = 1,2, … , ?:
compute
?? = x?
?(? − ?? + x???)
and set
?? =
sign(??) (|??| −
?(1 − ?)
2
)
+
?? + ??

Step 6. Repeat Step 5 for 1000 iterations or until convergence (vector ? does not change)
Step 7. Set the last updated parameter vector as �̂� = [�̂�1, �̂�2, … , �̂�?]
4

Algorithm 2 (non-vectorized):
Step 1. Fix tuning parameters ? and ?
Step 2. Generate ?-dimensional centered response vector ? and ? × ? standardized
(centered and scaled to have unit standard deviation) design matrix ?
Step 3. Precompute ??, ? = 1,2, … , ?, as
?? =∑???
2
?
?=1

Step 4. Randomly initialize the parameter vector ? = [?1, ?2, … , ??]
Step 5. For each ?, ? = 1,2, … , ?:
compute
?? =∑???
(


?? −∑?????
?
?=1
?≠? )


?
?=1

and set
?? =
sign(??) (|??| −
?(1 − ?)
2
)
+
?? + ??

Step 6. Repeat Step 5 for 1000 iterations or until convergence (vector ? does not change)
Step 7. Set the last updated parameter vector as �̂� = [�̂�1, �̂�2, … , �̂�?]

Note that we define
sign(?) = {
−1 if ? < 0
1 if ? ≥ 0
?+ = {
0 if ? < 0
? if ? ≥ 0

and we use the notation x? as the ?th column of the design matrix ? (the ?th feature vector).
This vector by definition is an ?-dimensional column vector.

When randomly initializing the parameter vector, I would make sure that the parameters start
at small values. A good strategy here may be to randomly initialize each of the ??, ? = 1,2, … , ?,
parameters from a uniform distribution between −1 and 1.

Effect of tuning parameter on infe
ed regression coefficients
You will consider a discrete grid of nine tuning parameter values ? ∈
{10−2, 10−1, 100, 101, 102, 103, 104, 105, 106} where the tuning parameter is evaluated across
a wide range of values on a log scale, as well as six tuning parameter values ? ∈ {0,
1
5
,
2
5
,
3
5
,
4
5
, 1}.
For each tuning parameter value pair, you will use coordinate descent to infer the best-fit
model. Note that when ? = 0, we obtain the lasso estimate, and when ? = 1, we obtain the
idge regression estimate.
5

Deliverable 1: Illustrate the effect of the tuning parameter on the infe
ed elastic net
egression coefficients by generating six plots (one for each ? value) of nine lines (one for
each of the ? = 9 features), with the ?-axis as �̂�?, ? = 1,2, … ,9, and the ?-axis the
co
esponding log-scaled tuning parameter value log10(?) that generated the particular �̂�?.
Label both axes in all six plots. Without the log scaling of the tuning parameter ?, the plots
will look distorted.

Choosing the best tuning parameter
You will consider a discrete grid of nine tuning parameter values ? ∈
{10−2, 10−1, 100, 101, 102, 103, 104, 105, 106} where the tuning parameter is evaluated across
a wide range of values on a log scale, as well as six tuning parameter values ? ∈ {0,
1
5
,
2
5
,
3
5
,
4
5
, 1}.
For each tuning parameter value pair, perform five-fold cross validation and choose the pair of
? and ? values that give the smallest
CV(5) =
1
5
∑MSE?
5
?=1

where MSE? is the mean squared e
or on the validation set of the ?th-fold.

Note that during the five-fold cross validation, you will hold out one of the five sets (here 80
observations) as the Validation Set and the remaining four sets (the other 320 observations)
will be used as the Training Set. On this Training Set, you will need to center the output and
standardize (center and divided by the standard deviation across samples) each feature. These
identical values used for centering the output and standardizing the input will need to be
applied to the co
esponding Validation Set, so that the Validation set is on the same scale.
Because the Training Set changes
Answered 6 days After Oct 17, 2021

Solution

Sandeep Kumar answered on Oct 24 2021
117 Votes
ass2.ipyn
{
"cells": [
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"id": "ZfXIWZ57NkON"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import os\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"id": "nMdVQfNePLQJ"
},
"outputs": [],
"source": [
"class Elastic:\n",
" def __init__(self, preds, responses, lambdas, alphas, nf = 5, iters = 1000, to_ve
= False):\n",
"\n",
" self.x, self.y = self._shuffle_data(preds, responses)\n",
" self.lambdas = lambdas \n",
" self.alphas = alphas\n",
" self.nf = nf \n",
" self.iters = iters\n",
" self.to_ve
= to_ve
\n",
" self._initialize()\n",
"\n",
"\n",
" def _initialize(self):\n",
" '''\n",
" This function initializes certain values needed for training and checks that certain\n",
" arguments have appropriate values. \n",
" '''\n",
"\n",
" # get number of samples and number of features\n",
" self.n_samples = self.x.shape[0]\n",
" self.n_preds = self.x.shape[1]\n",
" \n",
" # matrix to store the cross validation results \n",
" self.cv_vals = np.zeros([self.nf, len(self.lambdas), len(self.alphas)])\n",
"\n",
" # determine the number of validation samples and their inds based on nf \n",
" self.n_val_samples = self.n_samples
self.nf \n",
" self.val_inds = list(range(0, self.n_samples, self.n_val_samples))\n",
" \n",
" if self.to_ve
:\n",
" print('[INFO] Using {} training and {} validation samples for each CV fold.'.format(\n",
" self.n_samples - self.n_val_samples, self.n_val_samples)\n",
" )\n",
"\n",
" # create a tensor to store the Betas \n",
" self.B_trained = np.zeros([self.nf, len(self.lambdas), len(self.alphas), self.n_preds])\n",
"\n",
" def _standardize(self, x, mean_vec, std_vec):\n",
"\n",
"\n",
" return (x - mean_vec) / std_vec \n",
"\n",
" def _center_responses(self, y, mean):\n",
"\n",
"\n",
" return y - mean\n",
"\n",
" def _shuffle_data(self, preds, responses):\n",
"\n",
"\n",
" data = np.concatenate((preds, responses[:, None]), 1)\n",
" np.random.shuffle(data)\n",
" return data[:, :-1], data[:, -1]\n",
"\n",
" def _initialize_B(self):\n",
"\n",
"\n",
" return np.random.uniform(low = -1, high = 1, size = (self.n_preds, 1))\n",
"\n",
" def predict(self, x):\n",
"\n",
"\n",
" assert(self.mean_vec is not None and self.std_vec is not None), \\\n",
" 'Model must be trained before predicting.'\n",
"\n",
" x = self._standardize(x, self.mean_vec, self.std_vec)\n",
" return np.matmul(x, self.B)\n",
"\n",
" def _get_folds(self, val_ind):\n",
" \n",
"\n",
" x_val = self.x[val_ind:val_ind + self.n_val_samples]\n",
" x_train = np.delete(self.x, np.arange(val_ind, val_ind + self.n_val_samples), axis = 0)\n",
" y_val = self.y[val_ind:val_ind + self.n_val_samples]\n",
" y_train = np.delete(self.y, np.arange(val_ind, val_ind + self.n_val_samples), axis = 0)\n",
" return x_train, x_val, y_train, y_val\n",
"\n",
" def _update(self, x, y, B, ss_cols, lambda_, alpha):\n",
"\n",
" \n",
" for k in range(self.n_preds):\n",
" # get rss without the effect of coefficient k\n",
" rss_k = y - np.matmul(x, B) + (x[:, k] * B[k])[:, None]\n",
" \n",
" # a_k is part of the derivative of the rss term in the loss function\n",
" a_k = np.matmul(x[:, k].transpose(), rss_k)[0]\n",
" \n",
" # update B_k\n",
" B_k = np.absolute(a_k) - lambda_ * (1 - alpha) / 2\n",
" B_k = B_k if B_k >= 0 else 0\n",
" B[k, 0] = np.sign(a_k) * B_k / (ss_cols[k] + lambda_ * alpha)\n",
" \n",
" return B\n",
"\n",
" def score(self, x, y, B):\n",
"\n",
"\n",
" y_hat = np.matmul(x, B)\n",
" mse = np.mean((y - y_hat) ** 2)\n",
" return mse\n",
" \n",
" def _compute_ss_cols(self, x):\n",
" '''\n",
" Computes the sum of squares for each column needed in the update.\n",
" Only need to do this once per fold at the beginning. \n",
" \n",
" Args:\n",
" x (np.nda
ay): The N x M design matrix of the cu
ent fold. \n",
" \n",
" Returns:\n",
" ss_cols (np.nda
ay): The M-dim vector representing the sum\n",
" of squares of the cols of x.\n",
" '''\n",
" return np.sum(x ** 2, 0)\n",
" \n",
" def _find_best_tuning_params(self):\n",
" '''\n",
" Finds the value of lambda and alpha with minimum co
esponding CV score and\n",
" saves them as class attributes. \n",
" '''\n",
" cv_mean = np.mean(self.cv_vals, 0)\n",
" best_lambda_ind, best_alpha_ind = np.where(cv_mean == np.amin(cv_mean))\n",
" self.best_lambda = self.lambdas[best_lambda_ind[0]]\n",
" self.best_alpha = self.alphas[best_alpha_ind[0]]\n",
"\n",
" def fit(self):\n",
" '''\n",
" Implements the cross validation and retrains the model with the optimal lambda \n",
" and alpha on the entire dataset.\n",
" '''\n",
"\n",
" for i_lambda, lambda_ in enumerate(self.lambdas): # loop through lambdas\n",
" for i_alpha, alpha in enumerate(self.alphas): # loop through alphas\n",
" for i_fold, val_ind in zip(range(self.nf), self.val_inds): # loop through folds\n",
" # get the folds\n",
" x_train, x_val, y_train, y_val = self._get_folds(val_ind)\n",
"\n",
" # standardize x and center y\n",
" mean_vec, std_vec = np.mean(x_train, 0), np.std(x_train, 0)\n",
" mean_response = np.mean(y_train)\n",
" x_train = self._standardize(x_train, mean_vec, std_vec)\n",
" x_val = self._standardize(x_val, mean_vec, std_vec)\n",
" y_train = self._center_responses(y_train, mean_response)[:, None]\n",
" y_val = self._center_responses(y_val, mean_response)[:, None]\n",
" \n",
" # compute b_k given this fold -- don't need to compute every update\n",
" ss_cols = self._compute_ss_cols(x_train)\n",
"\n",
" # initialize Beta for this lambda and fold\n",
" B = self._initialize_B()\n",
"\n",
" for iter in range(self.iters):\n",
" B = self._update(x_train, y_train, B, ss_cols, lambda_, alpha)\n",
" \n",
" # score this model \n",
" score = self.score(x_val, y_val, B)\n",
"\n",
" # store the score with the tuning param combinations\n",
" self.cv_vals[i_fold, i_lambda, i_alpha] = score\n",
"\n",
" # store the coefficient vector\n",
" self.B_trained[i_fold, i_lambda, i_alpha] = B[:, 0]\n",
" \n",
" # if to_ve
flag, then print out the mean CV MSE for the combo of lambda and alpha\n",
" if self.to_ve
:\n",
" print('lambda:{}; alpha:{}; CV MSE:{}'.format(\n",
" lambda_, alpha, np.mean(self.cv_vals[:, i_lambda, i_alpha]))\n",
" )\n",
"\n",
"\n",
" ############# Retrain on entire dataset with optimal lambda and alpha #############\n",
" \n",
" # find the best lambda and alpha\n",
" self._find_best_tuning_params()\n",
" \n",
" # standardize features of x and center responses \n",
" self.mean_vec, self.std_vec = np.mean(self.x, 0), np.std(self.x, 0)\n",
" x = self._standardize(self.x, self.mean_vec, self.std_vec)\n",
" y = self._center_responses(self.y, np.mean(self.y))[:, None]\n",
" \n",
" # compute the sum of squares for each feature on the entire dataset\n",
" ss_cols = self._compute_ss_cols(x)\n",
" \n",
" # initialize coefficients\n",
" self.B = self._initialize_B()\n",
" \n",
" # perform updates \n",
" for iter in range(self.iters):\n",
" self.B = self._update(x, y, self.B, ss_cols, self.best_lambda, self.best_alpha)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"colab": {
"base_uri": "https:
localhost:8080/",
"height": 419
},
"id": "2yKcu-xHOwnO",
"outputId": "6b6549d7-3105-4600-c9a1-1
42d9f335c"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"