{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this...

Question

{ "cells": [  {   "cell_type": "markdown",   "metadata": {},   "source": [    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightaow$Restart) and then **run all cells** (in the menubar, select Cell$\rightaow$Run All).
",    "
",    "Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE". (After you have done that, you can delete the 'raise NotImplementedEor()' line, and then run your code to check that it works).
",    "
",    "Also, enter your NAME in the next cell.
"   ]  },  {   "cell_type": "code",   "execution_count": 1,   "metadata": {},   "outputs": [],   "source": [    "NAME = """   ]  },  {   "cell_type": "markdown",   "metadata": {},   "source": [    "---"   ]  },  {   "cell_type": "markdown",   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "01648f2c7a2b86d733aea6aeb05487e8",     "grade": false,     "grade_id": "jupyter",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "source": [    "# ICT706 SouthBank 2020 Semester 1 Task 2
",    "
",    "This assignment will be done completely inside this Jupyter notebook.
",    "
",    "### Background
",    "A medium-size company has given you one year of data about the online purchases that their customers have made.  They want you to analyse the data using statistical and machine learning techniques and produce:
",    "* a prediction algorithm for predicting how much money each customer is likely to spend in a year;
",    "* a classification algorithm for predicting which customers will be 'big spenders';
",    "* some recommendations on what marketing strategy they should use to attract more 'big spender' customers.
",    "
",    "### Instructions
",    "Follow all the instructions in this notebook to complete these tasks.  Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps coectly.  (If they give eors, then go back and coect your previous work until you fix those eors.  Once those 'assert' cells execute without eors, you know that you have achieved the marks for that step.) 
",    "
",    "When you have finished, this notebook is the only file that you will need to submit to Blackboard.
",    "
",    "Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook.  Just make sure that before you submit your notebook, that those extra cells execute without eor, or that you delete them before submitting.
",    "
",    "### Overview
",    "You have five sections to complete in this Notebook (total = 100 marks):
",    "* Part A: Load and Clean Data (20 points)
",    "* Part B Data Exploration (30 points)
",    "* Part C: Predicting Spending Levels (20 points)
",    "* Part D: Predicting Big Spenders (20 points)
",    "* Part E: Business Recommendations (10 points)"   ]  },  {   "cell_type": "code",   "execution_count": 3,   "metadata": {    "deletable": false,    "nbgrader": {     "checksum": "8e8e40c5312c2594db509b8e4c9f731d",     "grade": false,     "grade_id": "imports",     "locked": false,     "schema_version": 1,     "solution": true    }   },   "outputs": [],   "source": [    "# add all your imports here.
",    "# YOUR CODE HERE
",    "raise NotImplementedEor()"   ]  },  {   "cell_type": "markdown",   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "cbe1db68acf76763db3e19b29162d9",     "grade": false,     "grade_id": "cell-56b1c85226f679a1",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "source": [    "---
",    "# Part A: Load and Clean Data (20 points)
",    "
",    "Save your CSV data file into the same folder as this notebook.
",    "
",    "Write Python code to load your dataset into a Pandas DataFrame called 'sales'."   ]  },  {   "cell_type": "code",   "execution_count": 1,   "metadata": {    "deletable": false,    "nbgrader": {     "checksum": "f1866306460b18ba8285f9073b7870",     "grade": false,     "grade_id": "read_sales",     "locked": false,     "schema_version": 1,     "solution": true    }   },   "outputs": [],   "source": [    "# YOUR CODE HERE
",    "raise NotImplementedEor()"   ]  },  {   "cell_type": "markdown",   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "465472c32b09a4d2bd97325a77ab7dae",     "grade": false,     "grade_id": "cell-08fd91c8f6a3f1ab",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "source": [    "After you have loaded the data coectly, you should have 10,000 rows. 
",    "Run the following cells and tests to check that you have done this coectly."   ]  },  {   "cell_type": "code",   "execution_count": 2,   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "30e709e48a4861e49bfcd0f34e07af3b",     "grade": false,     "grade_id": "cell-802dd990ff739a",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "outputs": [],   "source": [    "sales.head()"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "18fa2c34a21ec341e571e978461c2d74",     "grade": true,     "grade_id": "data_loaded",     "locked": false,     "points": 5,     "schema_version": 1,     "solution": false    }   },   "outputs": [],   "source": [    """"Check that 'sales' has the right shape and number of rows (5 points)."""
",    "assert len(sales.columns) == 10
",    "assert sales.columns[0] == "CustNum"
",    "assert sales.shape == (10000, 10)"   ]  },  {   "cell_type": "markdown",   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "290dd5079318da97a499eb5f4e56e8c0",     "grade": false,     "grade_id": "cell-cbd5370682d8937a",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "source": [    "## Cleaning the Data
",    "
",    "Some of the columns are strings, with dollar signs.  But we need to convert them to numbers (float) so that we can do calculations on them.  The next cell shows what will go wrong if we try doing calculations *before* converting them floats!"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {    "deletable": false,    "editable": false,    "nbgrader": {     "checksum": "f37fc394053f62b401f14b6d76a4c800",     "grade": false,     "grade_id": "cell-c0f6f29476bf6fc8",     "locked": true,     "schema_version": 1,     "solution": false    }   },   "outputs": [],   "source": [    "s2 = sales["Spend"] * 4
",    "s2.head()"   ]  },  {   "cell_type": "code",   "execution_count": null,   "metadata": {    "deletable": false,    "nbgrader": {     "checksum": "656807b5b835cf1f7968d34cea95417c",     "grade": false,

Sandeep Kumar · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).
",
    "
",
    "Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE". (After you have done that, you can delete the 'raise NotImplementedError()' line, and then run your code to check that it works).
",
    "
",
    "Also, enter your NAME in the next cell.
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
    "NAME = "Ashma Dhakal""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "01648f2c7a2b86d733aea6aeb05487e8",
     "grade": false,
     "grade_id": "jupyter",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# ICT706 SouthBank 2020 Semester 1 Task 2
",
    "
",
    "This assignment will be done completely inside this Jupyter notebook.
",
    "
",
    "### Background
",
    "A medium-size company has given you one year of data about the online purchases that their customers have made.  They want you to analyse the data using statistical and machine learning techniques and produce:
",
    "* a prediction algorithm for predicting how much money each customer is likely to spend in a year;
",
    "* a classification algorithm for predicting which customers will be 'big spenders';
",
    "* some recommendations on what marketing strategy they should use to attract more 'big spender' customers.
",
    "
",
    "### Instructions
",
    "Follow all the instructions in this notebook to complete these tasks.  Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps correctly.  (If they give errors, then go back and correct your previous work until you fix those errors.  Once those 'assert' cells execute without errors, you know that you have achieved the marks for that step.) 
",
    "
",
    "When you have finished, this notebook is the only file that you will need to submit to Blackboard.
",
    "
",
    "Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook.  Just make sure that before you submit your notebook, that those extra cells execute without error, or that you delete them before submitting.
",
    "
",
    "### Overview
",
    "You have five sections to complete in this Notebook (total = 100 marks):
",
    "* Part A: Load and Clean Data (20 points)
",
    "* Part B Data Exploration (30 points)
",
    "* Part C: Predicting Spending Levels (20 points)
",
    "* Part D: Predicting Big Spenders (20 points)
",
    "* Part E: Business Recommendations (10 points)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "8e8e40c5312c2594db509b8e4c9f731d",
     "grade": false,
     "grade_id": "imports",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
   ],
   "source": [
    "# add all your imports here.
",
    "# YOUR CODE HERE
",
    "import pandas as pd
",
    "import numpy as np
",
    "from sklearn.linear_model import LinearRegression
",
    "from sklearn.preprocessing import LabelEncoder
",
    "from sklearn.ensemble import RandomForestClassifier
",
    "from sklearn.svm import SVC
",
    "from sklearn import model_selection
",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "bbcbe1db68acf76763db3e19b29162d9",
     "grade": false,
     "grade_id": "cell-56b1c85226f679a1",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "---
",
    "# Part A: Load and Clean Data (20 points)
",
    "
",
    "Save your CSV data file into the same folder as this notebook.
",
    "
",
    "Write Python code to load your dataset into a Pandas DataFrame called 'sales'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "f1866306460b18ba8285f9073b7870bb",
     "grade": false,
     "grade_id": "read_sales",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
   ],
   "source": [
    "# YOUR CODE HERE
",
    "sales = pd.read_csv("greenhatsales.csv")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "465472c32b09a4d2bd97325a77ab7dae",
     "grade": false,
     "grade_id": "cell-08fd91c8f6a3f1ab",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "After you have loaded the data correctly, you should have 10,000 rows. 
",
    "Run the following cells and tests to check that you have done this correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "30e709e48a4861e49bfcd0f34e07af3b",
     "grade": false,
     "grade_id": "cell-802dd990ff7bb39a",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CustNum
",
       "      Name
",
       "      Sex
",
       "      Age
",
       "      State
",
       "      Income
",
       "      Clicks
",
       "      LastSpend
",
       "      Purchases
",
       "      Spend
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      0
",
       "      Brandon Bender
",
       "      male
",
       "      67
",
       "      NSW
",
       "      120000
",
       "      709
",
       "      $2488.59
",
       "      8
",
       "      $1615.00
",
       "    
",
       "    
",
       "      1
",
       "      1
",
       "      Andre Mccormick
",
       "      male
",
       "      38
",
       "      VIC
",
       "      140000
",
       "      630
",
       "      $4295.34
",
       "      14
",
       "      $1927.20
",
       "    
",
       "    
",
       "      2
",
       "      2
",
       "      Ashley Smith
",
       "      female
",
       "      47
",
       "      NSW
",
       "      50000
",
       "      554
",
       "      $1986.09
",
       "      8
",
       "      $1660.80
",
       "    
",
       "    
",
       "      3
",
       "      3
",
       "      Ann Riley
",
       "      female
",
       "      33
",
       "      NSW
",
       "      100000
",
       "      309
",
       "      $1532.64
",
       "      10
",
       "      $3041.10
",
       "    
",
       "    
",
       "      4
",
       "      4
",
       "      Timothy Chavez
",
       "      male
",
       "      49
",
       "      NSW
",
       "      140000
",
       "      520
",
       "      $2082.08
",
       "      8
",
       "      $1764.40
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "   CustNum             Name     Sex  Age State  Income  Clicks LastSpend  \
",
       "0        0   Brandon Bender    male   67   NSW  120000     709  $2488.59   
",
       "1        1  Andre Mccormick    male   38   VIC  140000     630  $4295.34   
",
       "2        2     Ashley Smith  female   47   NSW   50000     554  $1986.09   
",
       "3        3        Ann Riley  female   33   NSW  100000     309  $1532.64   
",
       "4        4   Timothy Chavez    male   49   NSW  140000     520  $2082.08   
",
       "
",
       "   Purchases     Spend  
",
       "0          8  $1615.00  
",
       "1         14  $1927.20  
",
       "2          8  $1660.80  
",
       "3         10  $3041.10  
",
       "4          8  $1764.40  "
      ]
     },
     "execution_count": 4,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "18fa2c34a21ec341e571e978461c2d74",
     "grade": true,
     "grade_id": "data_loaded",
     "locked": false,
     "points": 5,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
   ],
   "source": [
    """"Check that 'sales' has the right shape and number of rows (5 points)."""
",
    "assert len(sales.columns) == 10
",
    "assert sales.columns[0] == "CustNum"
",
    "assert sales.shape == (10000, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "290dd5079318da97a499eb5f4e56e8c0",
     "grade": false,
     "grade_id": "cell-cbd5370682d8937a",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Cleaning the Data
",
    "
",
    "Some of the columns are strings, with dollar signs.  But we need to convert them to numbers (float) so that we can do calculations on them.  The next cell shows what will go wrong if we try doing calculations *before* converting them floats!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "f37fc394053f62b401f14b6d76a4c800",
     "grade": false,
     "grade_id": "cell-c0f6f29476bf6fc8",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    $1615.00$1615.00$1615.00$1615.00
",
       "1    $1927.20$1927.20$1927.20$1927.20
",
       "2    $1660.80$1660.80$1660.80$1660.80
",
       "3    $3041.10$3041.10$3041.10$3041.10
",
       "4    $1764.40$1764.40$1764.40$1764.40
",
       "Name: Spend, dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2 = sales["Spend"] * 4
",
    "s2.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "656807b5b835cf1f7968d34cea95417c",
     "grade": false,
     "grade_id": "remove_dollars",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
   ],
   "source": [
    "# Complete the following remove_dollar function 
",
    "# so that it removes any dollar signs and spaces
",
    "# and then returns the string as a number (float).
",
    "def remove_dollar(s):
",
    "    """Removes dollar signs and spaces from s.
",
    "    Returns it as a float.
",
    "    """
",
    "    # YOUR CODE HERE
",
    "    s = float(s.replace('$', '').replace(' ', ''))
",
    "    return s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "5f43a2833f4cb63651fe509dadb38345",
     "grade": true,
     "grade_id": "test_remove_dollars",
     "locked": false,
     "points": 5,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
   ],
   "source": [
    """"Check that remove_dollar() removes dollars and spaces properly (5 points)."""
",
    "assert remove_dollar("12") == 12.0
",
    "assert remove_dollar("$123") == 123.0
",
    "assert remove_dollar("  $1234") == 1234.0
",
    "assert remove_dollar(" $42.3 ") == 42.3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "0f12e4450e96bddc301b3dd171c141f7",
     "grade": false,
     "grade_id": "cell-2674b20169c63acf",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Clean up the Spend columns
",
    "
",
    "Apply your remove_dollar function to the "Spend" column (every row), and put the cleaned-up float values into a new column of your 'sales' DataFrame called **"SpendValue"**.
",
    "
",
    "Then do the same for the "LastSpend" column and put the float values into a new column called **"LastSpendValue"**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "2c3bc0d06aef53b2152af1d9de89ac5f",
     "grade": false,
     "grade_id": "clean_spends",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
   ],
   "source": [
    "# YOUR CODE HERE
",
    "sales['SpendValue'] = sales['Spend'].apply(remove_dollar)
",
    "sales['LastSpendValue'] = sales['LastSpend'].apply(remove_dollar)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CustNum
",
       "      Name
",
       "      Sex
",
       "      Age
",
       "      State
",
       "      Income
",
       "      Clicks
",
       "      LastSpend
",
       "      Purchases
",
       "      Spend
",
       "      SpendValue
",
       "      LastSpendValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      0
",
       "      Brandon Bender
",
       "      male
",
       "      67
",
       "      NSW
",
       "      120000
",
       "      709
",
       "      $2488.59
",
       "      8
",
       "      $1615.00
",
       "      1615.0
",
       "      2488.59
",
       "    
",
       "    
",
       "      1
",
       "      1
",
       "      Andre Mccormick
",
       "      male
",
       "      38
",
       "      VIC
",
       "      140000
",
       "      630
",
       "      $4295.34
",
       "      14
",
       "      $1927.20
",
       "      1927.2
",
       "      4295.34
",
       "    
",
       "    
",
       "      2
",
       "      2
",
       "      Ashley Smith
",
       "      female
",
       "      47
",
       "      NSW
",
       "      50000
",
       "      554
",
       "      $1986.09
",
       "      8
",
       "      $1660.80
",
       "      1660.8
",
       "      1986.09
",
       "    
",
       "    
",
       "      3
",
       "      3
",
       "      Ann Riley
",
       "      female
",
       "      33
",
       "      NSW
",
       "      100000
",
       "      309
",
       "      $1532.64
",
       "      10
",
       "      $3041.10
",
       "      3041.1
",
       "      1532.64
",
       "    
",
       "    
",
       "      4
",
       "      4
",
       "      Timothy Chavez
",
       "      male
",
       "      49
",
       "      NSW
",
       "      140000
",
       "      520
",
       "      $2082.08
",
       "      8
",
       "      $1764.40
",
       "      1764.4
",
       "      2082.08
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "   CustNum             Name     Sex  Age State  Income  Clicks LastSpend  \
",
       "0        0   Brandon Bender    male   67   NSW  120000     709  $2488.59   
",
       "1        1  Andre Mccormick    male   38   VIC  140000     630  $4295.34   
",
       "2        2     Ashley Smith  female   47   NSW   50000     554  $1986.09   
",
       "3        3        Ann Riley  female   33   NSW  100000     309  $1532.64   
",
       "4        4   Timothy Chavez    male   49   NSW  140000     520  $2082.08   
",
       "
",
       "   Purchases     Spend  SpendValue  LastSpendValue  
",
       "0          8  $1615.00      1615.0         2488.59  
",
       "1         14  $1927.20      1927.2         4295.34  
",
       "2          8  $1660.80      1660.8         1986.09  
",
       "3         10  $3041.10      3041.1         1532.64  
",
       "4          8  $1764.40      1764.4         2082.08  "
      ]
     },
     "execution_count": 13,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "0a2f7f9048af24c8c44cc2c73e6ea5e9",
     "grade": false,
     "grade_id": "cell-a2b9fa129543cf1f",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CustNum             int64
",
       "Name               object
",
       "Sex                object
",
       "Age                 int64
",
       "State              object
",
       "Income              int64
",
       "Clicks              int64
",
       "LastSpend          object
",
       "Purchases           int64
",
       "Spend              object
",
       "SpendValue        float64
",
       "LastSpendValue    float64
",
       "dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "63d529e18e524b173fd0bcc1425a4120",
     "grade": true,
     "grade_id": "test_clean_spends",
     "locked": true,
     "points": 5,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'Index' object has no attribute 'contains'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m
\u001b[1;32m      1\u001b[0m \u001b[0;31m# check the new SpendValue columns (5 points)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"SpendValue"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"LastSpendValue"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      4\u001b[0m \u001b[0;31m# check that they are floats\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m"SpendValue"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m"float64"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;31mAttributeError\u001b[0m: 'Index' object has no attribute 'contains'"
     ]
    }
   ],
   "source": [
    "# check the new SpendValue columns (5 points)
",
    "assert sales.columns.contains("SpendValue")
",
    "assert sales.columns.contains("LastSpendValue")
",
    "# check that they are floats
",
    "assert sales["SpendValue"].dtype == "float64"
",
    "assert sales["LastSpendValue"].dtype == "float64"
",
    "# check that the values are greater than zero.
",
    "assert (sales["SpendValue"] > 0.0).all()
",
    "assert (sales["LastSpendValue"] >= 0.0).all()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
   ],
   "source": [
    "## above code is wrong it should be sales.columns.str.contains ('SpendValue')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "00496c84c110bdfaa8d1ab0f6a9bfdbc",
     "grade": false,
     "grade_id": "cell-9317487e7c923ef9",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Make Sex and State numeric
",
    "
",
    "To use the Sex and State columns as input features for the machine learning algorithms in Scikit-Learn they must be numeric.
",
    "
",
    "Use the **LabelEncoder** object from the sklearn.preprocessing package to convert the 'Sex' column into an integer column called **"SexValue"**.  
",
    "
",
    "Also convert the "State" column into a integer column called **"StateValue"**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "7dd8627d56479eb8657776ba1ea3e8a8",
     "grade": false,
     "grade_id": "sexvalue_statevalue",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
   ],
   "source": [
    "enc = LabelEncoder()
",
    "sales["SexValue"] = enc.fit_transform(sales['Sex'])
",
    "sales["StateValue"] = enc.fit_transform(sales['State'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "0ae13631cb08d1c0cbff3e5bf3de78f3",
     "grade": false,
     "grade_id": "cell-7282fcbc237bd3c9",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      Name
",
       "      Sex
",
       "      SexValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      Brandon Bender
",
       "      male
",
       "      1
",
       "    
",
       "    
",
       "      1
",
       "      Andre Mccormick
",
       "      male
",
       "      1
",
       "    
",
       "    
",
       "      2
",
       "      Ashley Smith
",
       "      female
",
       "      0
",
       "    
",
       "    
",
       "      3
",
       "      Ann Riley
",
       "      female
",
       "      0
",
       "    
",
       "    
",
       "      4
",
       "      Timothy Chavez
",
       "      male
",
       "      1
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "              Name     Sex  SexValue
",
       "0   Brandon Bender    male         1
",
       "1  Andre Mccormick    male         1
",
       "2     Ashley Smith  female         0
",
       "3        Ann Riley  female         0
",
       "4   Timothy Chavez    male         1"
      ]
     },
     "execution_count": 19,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# see if Sex has been mapped to ints properly?
",
    "cols = ["Name", "Sex", "SexValue"]
",
    "sales[cols].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "e49114bef2381621dd5b70443e207dfe",
     "grade": false,
     "grade_id": "cell-c7ca24f1daefbe4c",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      Name
",
       "      State
",
       "      StateValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      Brandon Bender
",
       "      NSW
",
       "      1
",
       "    
",
       "    
",
       "      1
",
       "      Andre Mccormick
",
       "      VIC
",
       "      6
",
       "    
",
       "    
",
       "      2
",
       "      Ashley Smith
",
       "      NSW
",
       "      1
",
       "    
",
       "    
",
       "      3
",
       "      Ann Riley
",
       "      NSW
",
       "      1
",
       "    
",
       "    
",
       "      4
",
       "      Timothy Chavez
",
       "      NSW
",
       "      1
",
       "    
",
       "    
",
       "      5
",
       "      John Bennett
",
       "      VIC
",
       "      6
",
       "    
",
       "    
",
       "      6
",
       "      Teresa Wise
",
       "      QLD
",
       "      3
",
       "    
",
       "    
",
       "      7
",
       "      Andrew Nelson
",
       "      QLD
",
       "      3
",
       "    
",
       "    
",
       "      8
",
       "      Jon Aguilar
",
       "      NSW
",
       "      1
",
       "    
",
       "    
",
       "      9
",
       "      Priscilla Briggs
",
       "      NSW
",
       "      1
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "               Name State  StateValue
",
       "0    Brandon Bender   NSW           1
",
       "1   Andre Mccormick   VIC           6
",
       "2      Ashley Smith   NSW           1
",
       "3         Ann Riley   NSW           1
",
       "4    Timothy Chavez   NSW           1
",
       "5      John Bennett   VIC           6
",
       "6       Teresa Wise   QLD           3
",
       "7     Andrew Nelson   QLD           3
",
       "8       Jon Aguilar   NSW           1
",
       "9  Priscilla Briggs   NSW           1"
      ]
     },
     "execution_count": 20,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# see if State has been mapped to ints properly?
",
    "cols = ["Name", "State", "StateValue"]
",
    "sales[cols].head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "9568081227ace8e87156458f4c6b4715",
     "grade": true,
     "grade_id": "test_sexvalue_statevalue",
     "locked": true,
     "points": 5,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'Index' object has no attribute 'contains'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m
\u001b[1;32m      1\u001b[0m \u001b[0;31m# test the new SexValue and StateValue columns (5 points)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"SexValue"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"StateValue"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      4\u001b[0m \u001b[0;31m# check that they are integer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
\u001b[1;32m      5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msales\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m"SexValue"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstartswith\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m"int"\u001b[0m\u001b[0;34m)\u001b[0m   \u001b[0;31m# "int32" or "int64"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m
",
      "\u001b[0;31mAttributeError\u001b[0m: 'Index' object has no attribute 'contains'"
     ]
    }
   ],
   "source": [
    "# test the new SexValue and StateValue columns (5 points)
",
    "assert sales.columns.contains("SexValue")
",
    "assert sales.columns.contains("StateValue")
",
    "# check that they are integer
",
    "assert str(sales["SexValue"].dtype).startswith("int")   # "int32" or "int64"
",
    "assert str(sales["StateValue"].dtype).startswith("int") # "int32" or "int64"
",
    "# check that the values are greater than zero.
",
    "assert sales["SexValue"].max() == 1    # 0 and 1 only
",
    "assert sales["StateValue"].max() == 7  # 7 states in Australia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "0d29c2c288046210e1115c9171223383",
     "grade": false,
     "grade_id": "cell-251af098637b4c0a",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CustNum
",
       "      SexValue
",
       "      Age
",
       "      StateValue
",
       "      Income
",
       "      Clicks
",
       "      Purchases
",
       "      SpendValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      0
",
       "      1
",
       "      67
",
       "      1
",
       "      120000
",
       "      709
",
       "      8
",
       "      1615.0
",
       "    
",
       "    
",
       "      1
",
       "      1
",
       "      1
",
       "      38
",
       "      6
",
       "      140000
",
       "      630
",
       "      14
",
       "      1927.2
",
       "    
",
       "    
",
       "      2
",
       "      2
",
       "      0
",
       "      47
",
       "      1
",
       "      50000
",
       "      554
",
       "      8
",
       "      1660.8
",
       "    
",
       "    
",
       "      3
",
       "      3
",
       "      0
",
       "      33
",
       "      1
",
       "      100000
",
       "      309
",
       "      10
",
       "      3041.1
",
       "    
",
       "    
",
       "      4
",
       "      4
",
       "      1
",
       "      49
",
       "      1
",
       "      140000
",
       "      520
",
       "      8
",
       "      1764.4
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "   CustNum  SexValue  Age  StateValue  Income  Clicks  Purchases  SpendValue
",
       "0        0         1   67           1  120000     709          8      1615.0
",
       "1        1         1   38           6  140000     630         14      1927.2
",
       "2        2         0   47           1   50000     554          8      1660.8
",
       "3        3         0   33           1  100000     309         10      3041.1
",
       "4        4         1   49           1  140000     520          8      1764.4"
      ]
     },
     "execution_count": 22,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Finally, let us view just the numeric columns.
",
    "numcols = ["CustNum", "SexValue", "Age", "StateValue",
",
    "           "Income", "Clicks", "Purchases", "SpendValue"]
",
    "sales[numcols].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CustNum
",
       "      Name
",
       "      Sex
",
       "      Age
",
       "      State
",
       "      Income
",
       "      Clicks
",
       "      LastSpend
",
       "      Purchases
",
       "      Spend
",
       "      SpendValue
",
       "      LastSpendValue
",
       "      SexValue
",
       "      StateValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      0
",
       "      Brandon Bender
",
       "      male
",
       "      67
",
       "      NSW
",
       "      120000
",
       "      709
",
       "      $2488.59
",
       "      8
",
       "      $1615.00
",
       "      1615.0
",
       "      2488.59
",
       "      1
",
       "      1
",
       "    
",
       "    
",
       "      1
",
       "      1
",
       "      Andre Mccormick
",
       "      male
",
       "      38
",
       "      VIC
",
       "      140000
",
       "      630
",
       "      $4295.34
",
       "      14
",
       "      $1927.20
",
       "      1927.2
",
       "      4295.34
",
       "      1
",
       "      6
",
       "    
",
       "    
",
       "      2
",
       "      2
",
       "      Ashley Smith
",
       "      female
",
       "      47
",
       "      NSW
",
       "      50000
",
       "      554
",
       "      $1986.09
",
       "      8
",
       "      $1660.80
",
       "      1660.8
",
       "      1986.09
",
       "      0
",
       "      1
",
       "    
",
       "    
",
       "      3
",
       "      3
",
       "      Ann Riley
",
       "      female
",
       "      33
",
       "      NSW
",
       "      100000
",
       "      309
",
       "      $1532.64
",
       "      10
",
       "      $3041.10
",
       "      3041.1
",
       "      1532.64
",
       "      0
",
       "      1
",
       "    
",
       "    
",
       "      4
",
       "      4
",
       "      Timothy Chavez
",
       "      male
",
       "      49
",
       "      NSW
",
       "      140000
",
       "      520
",
       "      $2082.08
",
       "      8
",
       "      $1764.40
",
       "      1764.4
",
       "      2082.08
",
       "      1
",
       "      1
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "   CustNum             Name     Sex  Age State  Income  Clicks LastSpend  \
",
       "0        0   Brandon Bender    male   67   NSW  120000     709  $2488.59   
",
       "1        1  Andre Mccormick    male   38   VIC  140000     630  $4295.34   
",
       "2        2     Ashley Smith  female   47   NSW   50000     554  $1986.09   
",
       "3        3        Ann Riley  female   33   NSW  100000     309  $1532.64   
",
       "4        4   Timothy Chavez    male   49   NSW  140000     520  $2082.08   
",
       "
",
       "   Purchases     Spend  SpendValue  LastSpendValue  SexValue  StateValue  
",
       "0          8  $1615.00      1615.0         2488.59         1           1  
",
       "1         14  $1927.20      1927.2         4295.34         1           6  
",
       "2          8  $1660.80      1660.8         1986.09         0           1  
",
       "3         10  $3041.10      3041.1         1532.64         0           1  
",
       "4          8  $1764.40      1764.4         2082.08         1           1  "
      ]
     },
     "execution_count": 23,
     "metadata": {
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "1f77126ed4a81a595f68f8d426ce81c1",
     "grade": false,
     "grade_id": "cell-e459377e95816230",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "---
",
    "
",
    "# Part B Data Exploration (30 points)
",
    "
",
    "In this section, you will explore the data statistically and visually, to get a feel for what kinds of data you have, and how much people are spending on your web site.
",
    "
",
    "## B.1 Data Inspection
",
    "
",
    "Start by using the Pandas **describe()** function to analyse all the numeric columns of your 'sales' DataFrame.  Spend some time looking at this and making sure that you understand the average (mean) and range (min and max) of each column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "deletable": false,
    "nbgrader": {
     "checksum": "751cce9df93b4a6f2f432502c85ecda9",
     "grade": false,
     "grade_id": "cell-d8bed57ee90f2626",
     "locked": false,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CustNum
",
       "      Age
",
       "      Income
",
       "      Clicks
",
       "      Purchases
",
       "      SpendValue
",
       "      LastSpendValue
",
       "      SexValue
",
       "      StateValue
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      count
",
       "      10000.00000
",
       "      10000.000000
",
       "      10000.000000
",
       "      10000.00000
",
       "      10000.00000
",
       "      10000.000000
",

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select...

Solution

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select...

Solution

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select...