Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Homework 6 ESE 402/542 Due December 1, 2019 at 11:59pm Type or scan your answers as a single PDF file and submit on Canvas. Problem 1. Principal Component Analysis. Consider the following dataset: x y...

1 answer below »
Homework 6
ESE 402/542
Due December 1, 2019 at 11:59pm
Type or scan your answers as a single PDF file and submit on Canvas.
Problem 1. Principal Component Analysis. Consider the following dataset:
x y
0 1
1 1
2 1
2 3
3 2
3 3
4 5
(a) Standardize the data and derive the two principal components in sorted order. What
is the new transformed dataset using the first principal component?
(b) Repeat the previous analysis but this time do not standardize the original data. Is
Principal Component Analysis scale invariant?
Problem 2. k-means is sub-optimal. Recall that in class we defined the k-means problem
as the task of minimizing the k-means objective:
min
c1,c2,··· ,ck
n∑
i=1
||xi − c(xi)||22, (1)
where c(xi) is the closest center to xi. In this problem, we aim to show that the k-means
algorithm does not always find the best solution of the above problem. For every numbe
t > 1, show that there exists an instance of the k-means problem for which the k-means
algorithm (might) find a solution whose objective value is at least t×OPT, where OPT is
the minimum k-means objective. In other words, you should find a set of points x1, · · · , xn
for which the k-means algorithm may (if initialized badly) output a set of centers whose
objective value is at least a factor t of the optimal value of problem (1).
1
Problem 3. Polynomial Regression. Load the dataset poly data.csv. The first column is a
vector of predictors x and the second column is a vector of responses y. Suppose we believe
it was generated by some polynomial of the predictors with Gaussian e
or, and we would
like to recover the true coefficients of the underlying process. A polynomial regression can be
estimated by including all powers of x as predictors in the model. For example, to estimate
a quadratic regression, we include the predictors x and x2 as well as t the intercept.
(a) Pick a set of polynomial models. Compute the k-fold cross validation e
or with respect
to mean squared e
or for each of these models. Report the value of k that you use
and plot the cross-validation e
or as a function of polynomial power.
(b) Choose a model from your initial set, and re-run on the entire data set. Report the
coefficients and make a scatter plot of x and y with your fitted polynomial. Justify
your selection.
Problem 4. Extra Credit Load the Labeled Faces in the Wild dataset from sklearn. You
can load this data as follows:
from sklearn.datasets import fetch lfw people
faces = fetch lfw people(min faces per person=60)
For this exercise, we will use PCA on image data, in particular pictures of faces, to ex-
tract features.
(a) Perform PCA on the dataset to find the first 150 components. Since this is a large
dataset, you should use randomized PCA instead, which can also be found on sklearn.
Show the eigenfaces associated with the first 1 through 25 principal components.
(b) Using the first 150 components you found, reconstruct a few faces of your choice and
compare them with the original input images.
2
Answered Same Day Nov 25, 2021

Solution

Ximi answered on Dec 02 2021
142 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Problem 1\n",
"data = [\n",
" [0, 1],\n",
" [1, 1],\n",
" [2, 1],\n",
" [2, 3],\n",
" [3, 2],\n",
" [3, 3],\n",
" [4, 5],\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Standardizing the data as per requirement\n",
"scaled_data = StandardScaler().fit_transform(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca = PCA(n_components=2)\n",
"pca.fit(scaled_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Explained variance\n",
"pca.explained_variance_ratio_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Transformed data \n",
"pca.transform(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Applying on non-standardized data \n",
"pca = PCA(n_components=2)\n",
"pca.fit(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_ratio_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Transformed data\n",
"pca.transform(data)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"n_digits: 10, \t n_samples 1797, \t n_features 64\n",
"__________________________________________________________________________________\n",
"init\t\ttime\tinertia\thomo\tcompl\tv-meas\tARI\tAMI\tsilhouette\n",
"k-means++\t0.17s\t69432\t0.602\t0.650\t0.625\t0.465\t0.621\t0.146\n",
"random \t0.16s\t69694\t0.669\t0.710\t0.689\t0.553\t0.686\t0.147\n",
"PCA-based\t0.03s\t70804\t0.671\t0.698\t0.684\t0.561\t0.681\t0.118\n",
"__________________________________________________________________________________\n"
]
},
{
"data": {
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Problem 2\n",
"from time import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn import metrics\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.datasets import load_digits\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.preprocessing import scale\n",
"\n",
"np.random.seed(42)\n",
"\n",
"digits = load_digits()\n",
"data = scale(digits.data)\n",
"\n",
"n_samples, n_features = data.shape\n",
"n_digits = len(np.unique(digits.target))\n",
"labels...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here