Python assignment. Look at the zip folder, that has both the Python assignment and data set for it.

Question

Vidhi · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Module A _(55 pts)_ Optimizing for predicted separation
",
    "__Data.__ For this module we'll be working with the basic baseball player heights and weights from the lecture notes. These are packaged in the following location:
",
    "
",
    "- `"./data/baseball_heightweight.csv"`
",
    "
",
    "__Overview.__ The purpose of this module is to provide experience with optimization and to view/investigate a different optimizable functions (from error). In the __Chapter 5.__ lecture notes we investigated the sum of squared errors as a function of (linear) model parameters. This made the goal of optimization (executed through gradient descent) into a 'fitting' procedure, i.e., describing a pattern that data follow.
",
    "
",
    "Another way we can use a line is as a 'separator': supposing some data $x_1, \cdots, x_n$ have labels $y_1, \cdots, y_n$ falling into two classes (coded as $\pm 1$), which set of line-defining $w$ and $b$ parameters separate the points the best. Check out the picture  below for an intution (note: their $b$ is our $-b$).
",
    "
",
    "Note: even though we're talking about 'lines', and actually working with true, 2-dimensional lines throughout this module, it's important to think/code throughout this module with the expectation that each data point $x_i$ will have $m$ (potentially greater than 2) dimensions. When $m>3$, a 'line' technically refers to a plane, but after we leave the comfort of 3-dimensionality our 'lines' are technically _hyperplanes_, for which we have little experiential intuition. Hence, we'll adopt the terminology 'line' throughout, since this provides the strongest grounding of intuition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![SVM Margin](./images/SVM_margin.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__A1.__ _(7 pts_) To get things started: 
",
    "
",
    "1. load the baseball player data with pandas into a dataframe called `bball`;
",
    "2. filter `bball` to contain only those rows that correspond to shortstops and catchers;
",
    "3. create an `.array()` called `y_bball` of labels, containing a `1` for each catcher and a `-1` for each shortstop;
",
    "4. filter the remaining rows of `bball` to only the `Height` and `Weight` columns; and
",
    "5. standardize the columns of `bball` and exhibit the first 10 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "ename": "FileNotFoundError",
     "evalue": "File b'./data/baseball_heightweight.csv' does not exist",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m
\u001b[1;32m      4\u001b[0m \u001b[0marr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m----> 6\u001b[0;31m \u001b[0mbball\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"./data/baseball_heightweight.csv"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m      7\u001b[0m \u001b[0mbball\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbball\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbball\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Position'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'Shortstop'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mbball\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Position'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'Catcher'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mC:\Users\user\PycharmProjects\Lastname-Assn7\venv\lib\site-packages\pandas\io\parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m
\u001b[1;32m    676\u001b[0m                     skip_blank_lines=skip_blank_lines)
\u001b[1;32m    677\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m--> 678\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m    679\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m    680\u001b[0m     \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mC:\Users\user\PycharmProjects\Lastname-Assn7\venv\lib\site-packages\pandas\io\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m
\u001b[1;32m    438\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m    439\u001b[0m     \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m--> 440\u001b[0;31m     \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m    441\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m    442\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mC:\Users\user\PycharmProjects\Lastname-Assn7\venv\lib\site-packages\pandas\io\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m
\u001b[1;32m    785\u001b[0m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'has_index_names'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'has_index_names'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m    786\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m--> 787\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m    788\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m    789\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mC:\Users\user\PycharmProjects\Lastname-Assn7\venv\lib\site-packages\pandas\io\parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m
\u001b[1;32m   1012\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'c'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m   1013\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'c'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m-> 1014\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCParserWrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m   1015\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m   1016\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'python'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mC:\Users\user\PycharmProjects\Lastname-Assn7\venv\lib\site-packages\pandas\io\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m
\u001b[1;32m   1706\u001b[0m         \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'usecols'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0musecols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m   1707\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m-> 1708\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTextReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m   1709\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m   1710\u001b[0m         \u001b[0mpassed_names\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnames\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;32mpandas\_libs\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.__cinit__\u001b[0;34m()\u001b[0m
",
      "\u001b[0;32mpandas\_libs\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._setup_parser_source\u001b[0;34m()\u001b[0m
",
      "\u001b[0;31mFileNotFoundError\u001b[0m: File b'./data/baseball_heightweight.csv' does not exist"
     ],
     "output_type": "error"
    }
   ],
   "source": [
    "import pandas as pd 
",
    "import numpy as np
",
    "
",
    "arr = []
",
    "
",
    "bball = pd.read_csv("./data/baseball_heightweight.csv") 
",
    "bball = bball.loc[(bball['Position'] == 'Shortstop') | (bball['Position'] == 'Catcher')]
",
    "
",
    "for x in bball['Position']:
",
    "    if x == 'Catcher':
",
    "       arr.append(1)
",
    "    else:
",
    "       arr.append(-1)
",
    "
",
    "bball = bball[['Height','Weight']]
",
    "
",
    "## pre-processing: standardize the heights and weights
",
    "bball_height = bball["Height"] - np.mean(bball["Height"])
",
    "bball["Height"] = bball_height / np.std(bball["Height"])
",
    "
",
    "bball_weight = bball["Weight"] - np.mean(bball["Weight"])
",
    "bball["Weight"] = bball_weight / np.std(bball["Weight"])
",
    "
",
    "print(bball.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__A2.__ _(5 pts)_ Since our goal with this problem will be to draw a line that separates the catchers from the shortstops, write and execute a function called 
",
    "
",
    "- `plot_data_boundary(data, y)` 
",
    "
",
    "that creates a scatter plot of the standardized heights and weights where the points are color coded with shortstops as black circles and catchers as red squares. In the response box below, disucss how well you think a line can separate these data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_Response._ 
",
    "
",
    "A line wouldn't be necessary as group separation is evident as is (via color segmentation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png":

Python assignment. Look at the zip folder, that has both the Python assignment and data set for it.

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment