Homework 6ESE 402/542Due December 1, 2019 at 11:59pmType or scan your answers as a single PDF file...

Question

Homework 6ESE 402/542Due December 1, 2019 at 11:59pmType or scan your answers as a single PDF file and submit on Canvas.Problem 1. Principal Component Analysis. Consider the following dataset:x y0 11 12 12 33 23 34 5(a) Standardize the data and derive the two principal components in sorted order. Whatis the new transformed dataset using the first principal component?(b) Repeat the previous analysis but this time do not standardize the original data. IsPrincipal Component Analysis scale invariant?Problem 2. k-means is sub-optimal. Recall that in class we defined the k-means problemas the task of minimizing the k-means objective:minc1,c2,··· ,ckn∑i=1||xi − c(xi)||22, (1)where c(xi) is the closest center to xi. In this problem, we aim to show that the k-meansalgorithm does not always find the best solution of the above problem. For every numbet > 1, show that there exists an instance of the k-means problem for which the k-meansalgorithm (might) find a solution whose objective value is at least t×OPT, where OPT isthe minimum k-means objective. In other words, you should find a set of points x1, · · · , xnfor which the k-means algorithm may (if initialized badly) output a set of centers whoseobjective value is at least a factor t of the optimal value of problem (1).1Problem 3. Polynomial Regression. Load the dataset poly data.csv. The first column is avector of predictors x and the second column is a vector of responses y. Suppose we believeit was generated by some polynomial of the predictors with Gaussian eor, and we wouldlike to recover the true coefficients of the underlying process. A polynomial regression can beestimated by including all powers of x as predictors in the model. For example, to estimatea quadratic regression, we include the predictors x and x2 as well as t the intercept.(a) Pick a set of polynomial models. Compute the k-fold cross validation eor with respectto mean squared eor for each of these models. Report the value of k that you useand plot the cross-validation eor as a function of polynomial power.(b) Choose a model from your initial set, and re-run on the entire data set. Report thecoefficients and make a scatter plot of x and y with your fitted polynomial. Justifyyour selection.Problem 4. Extra Credit Load the Labeled Faces in the Wild dataset from sklearn. Youcan load this data as follows:from sklearn.datasets import fetch lfw peoplefaces = fetch lfw people(min faces per person=60)For this exercise, we will use PCA on image data, in particular pictures of faces, to ex-tract features.(a) Perform PCA on the dataset to find the first 150 components. Since this is a largedataset, you should use randomized PCA instead, which can also be found on sklearn.Show the eigenfaces associated with the first 1 through 25 principal components.(b) Using the first 150 components you found, reconstruct a few faces of your choice andcompare them with the original input images.2

Ximi · Accepted Answer

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Problem 1
",
    "data = [
",
    "    [0, 1],
",
    "    [1, 1],
",
    "    [2, 1],
",
    "    [2, 3],
",
    "    [3, 2],
",
    "    [3, 3],
",
    "    [4, 5],
",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Standardizing the data as per requirement
",
    "scaled_data = StandardScaler().fit_transform(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pca = PCA(n_components=2)
",
    "pca.fit(scaled_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pca.explained_variance_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Explained variance
",
    "pca.explained_variance_ratio_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transformed data 
",
    "pca.transform(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Applying on non-standardized data 
",
    "pca = PCA(n_components=2)
",
    "pca.fit(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pca.explained_variance_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pca.explained_variance_ratio_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transformed data
",
    "pca.transform(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "n_digits: 10, 	 n_samples 1797, 	 n_features 64
",
      "__________________________________________________________________________________
",
      "init		time	inertia	homo	compl	v-meas	ARI	AMI	silhouette
",
      "k-means++	0.17s	69432	0.602	0.650	0.625	0.465	0.621	0.146
",
      "random   	0.16s	69694	0.669	0.710	0.689	0.553	0.686	0.147
",
      "PCA-based	0.03s	70804	0.671	0.698	0.684	0.561	0.681	0.118
",
      "__________________________________________________________________________________
"
     ]
    },
    {
     "data": {
      "text/plain": [
       ""
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Problem 2
",
    "from time import time
",
    "import numpy as np
",
    "import matplotlib.pyplot as plt
",
    "
",
    "from sklearn import metrics
",
    "from sklearn.cluster import KMeans
",
    "from sklearn.datasets import load_digits
",
    "from sklearn.decomposition import PCA
",
    "from sklearn.preprocessing import scale
",
    "
",
    "np.random.seed(42)
",
    "
",
    "digits = load_digits()
",
    "data = scale(digits.data)
",
    "
",
    "n_samples, n_features = data.shape
",
    "n_digits = len(np.unique(digits.target))
",
    "labels = digits.target
",

Homework 6 ESE 402/542 Due December 1, 2019 at 11:59pm Type or scan your answers as a single PDF file and submit on Canvas. Problem 1. Principal Component Analysis. Consider the following dataset: x y...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment