Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

small.csv X1,X2,Y S,-0.1,19.19 S,2.53,22.74 S,4.86,23.91 M,0.26,7.07 M,2.55,7.93 M,4.87,8.93 L,0.08,20.63 L,2.62,23.46 L,5.09,25.75 __MACOSX/._small.csv part2.csv Month,Year,sales January,2012,...

1 answer below »
small.csv
X1,X2,Y
S,-0.1,19.19
S,2.53,22.74
S,4.86,23.91
M,0.26,7.07
M,2.55,7.93
M,4.87,8.93
L,0.08,20.63
L,2.62,23.46
L,5.09,25.75
__MACOSX/._small.csv
part2.csv
Month,Year,sales
January,2012,
Fe
uary,2012,
March,2012,
April,2012,
May,2012,
June,2012,
July,2012,
August,2012,
September,2012,1.71
October,2012,1.9
November,2012,2.74
December,2012,4.2
January,2013,1.45
Fe
uary,2013,1.8
March,2013,2.03
April,2013,1.99
May,2013,2.32
June,2013,2.2
July,2013,2.13
August,2013,2.43
September,2013,1.9
October,2013,2.13
November,2013,2.56
December,2013,4.16
January,2014,2.31
Fe
uary,2014,1.89
March,2014,2.02
April,2014,2.23
May,2014,2.39
June,2014,2.14
July,2014,2.27
August,2014,2.21
September,2014,1.89
October,2014,2.29
November,2014,2.83
December,2014,4.04
January,2015,2.31
Fe
uary,2015,1.99
March,2015,2.42
April,2015,2.45
May,2015,2.57
June,2015,2.42
July,2015,2.4
August,2015,2.5
September,2015,2.09
October,2015,2.54
November,2015,2.97
December,2015,4.35
January,2016,2.56
Fe
uary,2016,2.28
March,2016,2.69
April,2016,2.48
May,2016,2.73
June,2016,2.37
July,2016,2.31
August,2016,2.23
September,2016,
October,2016,
November,2016,
December,2016,
__MACOSX/._part2.csv
categoricals.pptx
CATEGORICAL VARIABLES
-ENCODING-
1
We will use data visualization
To understand the results of regression models with a categorical variable
And to show the model performance
EXAMPLES
Example 1
EXAMPLES
Consider the following dataset
Numerical Categorical Predictors – EXAMPLE
        X1        X2        Y
        S        -0.10        19.19
        S        2.53        22.74
        S        4.86        23.91
        M        0.26        7.07
        M        2.55        7.93
        M        4.87        8.93
        L        0.08        20.63
        L        2.62        23.46
        L        5.09        25.75
Increase size of table numbers
4
Consider the following dataset         XXXXXXXXXXLABEL ENCODING
        X1        X2        Y
        S        -0.10        19.19
        S        2.53        22.74
        S        4.86        23.91
        M        0.26        7.07
        M        2.55        7.93
        M        4.87        8.93
        L        0.08        20.63
        L        2.62        23.46
        L        5.09        25.75
        X1        X2        Y
        0        -0.10        19.19
        0        2.53        22.74
        0        4.86        23.91
        1        0.26        7.07
        1        2.55        7.93
        1        4.87        8.93
        2        0.08        20.63
        2        2.62        23.46
        2        5.09        25.75
Numerical Categorical Predictors – EXAMPLE
5
X1 and X2 in the model as continuous variables
R2 is close to 0.05, the explained variation of the response about the fitted equation is negligible
The Adjusted R-squared is negative and equal to XXXXXXXXXX
Both predictors X1 and X2 seem not to be useful for predicting Y.
Coefficients:
         Estimate Std. E
or t value Pr(>|t|)
(Intercept XXXXXXXXXX XXXXXXXXXX037 *
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX

Residual standard e
or: 8.505 on 6 degrees of freedom
Multiple R-squared: XXXXXXXXXX, Adjusted R-squared: XXXXXXXXXX
F-statistic: XXXXXXXXXXon 2 and 6 DF, p-value: 0.8504
Numerical Categorical Predictors – EXAMPLE
X1 and X2 in the model as continuous variables
Coefficients:
         Estimate Std. E
or t value Pr(>|t|)
(Intercept XXXXXXXXXX XXXXXXXXXX037 *
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX

Residual standard e
or: 8.505 on 6 degrees of freedom
Multiple R-squared: XXXXXXXXXX, Adjusted R-squared: XXXXXXXXXX
F-statistic: XXXXXXXXXXon 2 and 6 DF, p-value: 0.8504
Numerical Categorical Predictors – EXAMPLE
7
X1 and X2 in the model as continuous variables
Coefficients:
         Estimate Std. E
or t value Pr(>|t|)
(Intercept XXXXXXXXXX XXXXXXXXXX037 *
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX

Residual standard e
or: 8.505 on 6 degrees of freedom
Multiple R-squared: XXXXXXXXXX, Adjusted R-squared: XXXXXXXXXX
F-statistic: XXXXXXXXXXon 2 and 6 DF, p-value: 0.8504
Numerical Categorical Predictors – EXAMPLE
8
The fitted plane is
0.7769 X2
Numerical Categorical Predictors – EXAMPLE
Replace X1 with binary variables X11 and X12      XXXXXXXXXXONE-HOT ENCODING
        X11        X12        X2        Y
        0        0        -0.10        19.19
        0        0        2.53        22.74
        0        0        4.86        23.91
        1        0        0.26        7.07
        1        0        2.55        7.93
        1        0        4.87        8.93
        0        1        0.08        20.63
        0        1        2.62        23.46
        0        1        5.09        25.75
        X1        X2        Y
        S        -0.10        19.19
        S        2.53        22.74
        S        4.86        23.91
        M        0.26        7.07
        M        2.55        7.93
        M        4.87        8.93
        L        0.08        20.63
        L        2.62        23.46
        L        5.09        25.75
Numerical Categorical Predictors – EXAMPLE
10
.
Coefficients:
         Estimate Std. E
or t value Pr(>|t|)
(Intercept XXXXXXXXXX XXXXXXXXXX90e-07 ***
x XXXXXXXXXX XXXXXXXXXX4.54e-06 ***
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX **
Residual standard e
or: XXXXXXXXXXon 5 degrees of freedom
Multiple R-squared: XXXXXXXXXX, Adjusted R-squared: XXXXXXXXXX
F-statistic: 225 on 3 and 5 DF, p-value: 9.416e-06
Numerical Categorical Predictors – EXAMPLE
Coefficients:
         Estimate Std. E
or t value Pr(>|t|)
(Intercept XXXXXXXXXX XXXXXXXXXX90e-07 ***
x XXXXXXXXXX XXXXXXXXXX4.54e-06 ***
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
x XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX **
Residual standard e
or: XXXXXXXXXXon 5 degrees of freedom
Multiple R-squared: XXXXXXXXXX, Adjusted R-squared: XXXXXXXXXX
F-statistic: 225 on 3 and 5 DF, p-value: 9.416e-06
Numerical Categorical Predictors – EXAMPLE
The fitted equations for each level are
Numerical Categorical Predictors – EXAMPLE
What encoding is better?
Numerical Categorical Predictors – EXAMPLE
     XXXXXXXXXXLABEL ENCODING           XXXXXXXXXXONE-HOT ENCODING
        X11        X12        X2        Y
        0        0        -0.10        19.19
        0        0        2.53        22.74
        0        0        4.86        23.91
        1        0        0.26        7.07
        1        0        2.55        7.93
        1        0        4.87        8.93
        0        1        0.08        20.63
        0        1        2.62        23.46
        0        1        5.09        25.75
        X1        X2        Y
        0        -0.10        19.19
        0        2.53        22.74
        0        4.86        23.91
        1        0.26        7.07
        1        2.55        7.93
        1        4.87        8.93
        2        0.08        20.63
        2        2.62        23.46
        2        5.09        25.75
Numerical Categorical Predictors – EXAMPLE
15
                 LABEL             ONE-HOT                     ENCODING        ENCODING
R-squared            0.05259        0.9926
Adjusted R-squared: XXXXXXXXXX        0.9882
Numerical Categorical Predictors – EXAMPLE
16
Why are the models different?
Numerical Categorical Predictors – EXAMPLE
Label encoding prediction equation

0.7769 X2
One-hot encoding prediction equations
Numerical Categorical Predictors – EXAMPLE
label encoding results in a regression plane
one-hot encoding results in a set of regression lines (one for each category of the categorical variable)
Numerical Categorical Predictors – EXAMPLE
If the observations are close to a plane then label encoding and one-hot encoding results are good
Numerical Categorical Predictors – EXAMPLE
With a large number of variables in the model it is not possible to have a display like this
We may relay on R2 or cross-validation e
or to choose the best model
Numerical Categorical Predictors – EXAMPLE
Example 2
Forecasting
Consider the following demand data
EXAMPLE 2
Increase size of table numbers
23
Consider the following demand data
EXAMPLE 2
Increase size of table numbers
24
Wide format to long format
EXAMPLE 2
Increase size of table numbers
25
Model 1
Linear Regression
Use linear regression
                predict sales using time as predicto
EXAMPLE 2
Increase size of table numbers
27
EXAMPLE 2 – LINEAR REGRESION MODEL 1
Increase size of table numbers
28
        sales vs Period
EXAMPLE 2 – LINEAR REGRESION MODEL 1
Increase size of table numbers
29
Model 2
Categorical variable
and label encoding
predict sales using Year and Month
EXAMPLE 2 – MODEL 2
Increase size of table numbers
31
predict sales using Year and Month
label encode Month
EXAMPLE 2 – MODEL 2
Increase size of table numbers
32
predict sales using Year and Month
label encode Month
EXAMPLE 2 – MODEL 2
Increase size of table numbers
33
predict sales using Year and Month
label encode Month
EXAMPLE 2 – MODEL 2
Increase size of table numbers
34
predict sales using Year and Month
label encode Month
model sales using Year and Period
         XXXXXXXXXXboth numeric)
EXAMPLE 2 – MODEL 2
Increase size of table numbers
35
EXAMPLE 2 – MODEL 2
Increase size of table numbers
36
     sales vs Year and Period
EXAMPLE 2 – MODEL 2
Increase size of table numbers
37
Model 3
Categorical variable
and one-hot encoding
predict sales using Year and Month
one-hot encode Month
EXAMPLE 2 – MODEL 3
Increase size of table numbers
39
EXAMPLE 2 – MODEL 3
Increase size of table numbers
40
    sales vs Year and Month
EXAMPLE 2 – MODEL 3
Increase size of table numbers
41
__MACOSX/._categoricals.pptx
Day 6.docx
Day 6: Assignment 
Submit Assignment
· Submitting a file upload
· File Types pdf, doc, and docx
Instructions
Write and run the code developed in class to predict the total sales from file sales.csv
and submit the jupyter notebook in pdf format.
Submission
Click on the blue button in the top right corner to submit your assignment.
  
Click Next (below) to progress through the course.
Ru
ic
Assignment Ru
ic
        Assignment Ru
ic
        Criteria
        Ratings
        Pts
        This criterion is linked to a Learning OutcomeCoding
                60.0 to >30.0 pts
Full Marks
No coding e
ors
        30.0 to >20.0 pts
No Marks
More than three coding e
ors
        20.0 to >0 pts
Partial Marks
One to three coding e
ors
        60.0 pts
        This criterion is linked to a Learning OutcomeFormat & Editing
                40.0 to >20.0 pts
Full Marks
Code is clear and easy to follow. Plots display effective visualization.
Answered Same Day May 26, 2021

Solution

Kshitij answered on May 28 2021
132 Votes
day-6archive-qu3zg0my-gk010hyi/.ipynb_checkpoints/Day6-checkpoint.ipyn
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# importing li
aires \n",
"import pandas as pd\n",
"import numpy as np\n",
"from matplotlib import pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example 1"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"