Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

ICT112 Week 4 Lab ICT707 Big Data Assignment Big Data Assignment Marking Criteria The Big Data Assignment is comprised of two parts: · The first part is to create the algorithms in the tasks, namely:...

1 answer below »
ICT112 Week 4 La
ICT707 Big Data Assignment
Big Data Assignment Marking Criteria
The Big Data Assignment is comprised of two parts:
· The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard).
· The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided).

Ru
ic
    
    Datasets
    
    bike sharing [provided]
    Student selected dataset [from Kaggle.com]
    Decision Tree
    Decision Tree
    5
    5
    
    Decision Tree Categorical features
    5
    5
    
    Decision Tree Log
    5
    5
    
    Decision Tree Max Bins
    5
    5
    
    Decision Tree Max Depth
    5
    5
    Gradient Boosted Tree
    Gradient Boosted Tree
    5
    5
    
    Gradient boost tree iterations
    5
    5
    
    Gradient boost tree Max Bins
    5
    5
    Linear regression
    Linear regression
    5
    5
    
    Linear regression Cross Validation
    Intercept
    5
    5
    
    
    Iterations
    5
    5
    
    
    Step size
    5
    5
    
    
    L1 Regularization
    5
    5
    
    
    L2 Regularization
    5
    5
    
    Linear regression Log
    5
    5
    
    
    75
    75
    Total mark
    150
What needs to be submitted for marking:
For the Decision tree section a .py or .ipynb file for each of the following:
· Decision Tree
· Decision Tree Categorical features
· Decision Tree Log
· Decision Tree Max Bins
· Decision Tree Max Depth
For the Gradient boost tree section a .py or .ipynb file for each of the following:
· Gradient boost tree
· Gradient boost tree iterations
· Gradient boost tree Max Bins
For the Linear regression section a .py or .ipynb file for each of the following:
· Linear regression
· Linear regression Cross Validation
· Intercept
· Iterations
· Step size
· L1 Regularization
· L2 Regularization
· Linear regression Log
Each of the files submitted will be tested with the following datasets:
· bike sharing [which is provided on blackboard]
· A dataset of the students choice downloaded from Kaggle.com
[Hint] Write each algorithm so that it can take in a dataset name. For example:
raw_data = sc.textFile("/home/spark/data/hour.csv")
In this manner both datasets can be run with the same files.
Assignment
1. Utilising Python 3 Build the following regression models:
· Decision Tree
· Gradient Boosted Tree
· Linear regression
2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle https:
www.kaggle.com/datasets
3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2
a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1)
) Gradient boost tree Max Bins (see Big-Data Assignment.docx section 7.2)
4. Build the following in relation to the decision tree and the dataset choosen in step 2
a) Decision Tree Categorical features
) Decision Tree Log (see Big-Data Assignment.docxsection 5.4)
c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2)
d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1)
5. Build the following in relation to the linear regression and the dataset choosen in step 2
a) Linear regression Cross Validation
i. Intercept (see Big-Data Assignment.docx section 6.5)
ii. Iterations (see Big-Data Assignment.docx section 6.1)
iii. Step size (see Big-Data Assignment.docxsection 6.2)
iv. L1 Regularization (see Big-Data Assignment.docx section 6.4)
v. L2 Regularization (see Big-Data Assignment.docx section 6.3)
) Linear regression Log (see Big-Data Assignment.docx section 5.4)
6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1,3,4 and 5
Task 1
Task 1 is comprised of developing:
1. Decision Tree
a) Decision Tree Categorical features
) Decision Tree Log (see Big-Data Assignment.docx section 5.4)
c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2)
d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1)
The Output for this task and all the sub tasks are based on the the Bike sharing data set as input. Utilise the Bike sharing data set as input to test that the Decision Tree task and sub tasks (i.e.step 1 and 4 from the assignment) are working and producing the co
ect output before apply to your selected data set.
Decision Tree
Output 1:
Feature vector length for categorical features: 57
Feature vector length for numerical features: 4
Total feature vector length: 61
Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0]
Decision Tree feature vector length: 12
Decision Tree predictions: [(16.0, XXXXXXXXXX), (40.0, XXXXXXXXXX), (32.0, XXXXXXXXXX), (13.0, XXXXXXXXXX), (1.0, XXXXXXXXXX)]
Decision Tree depth: 5
Decision Tree number of nodes: 63
Decision Tree - Mean Squared E
or: XXXXXXXXXX
Decision Tree - Mean Absolute E
or: XXXXXXXXXX
Decision Tree - Root Mean Squared Log E
or: 0.6251
Output 2:
Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0]
Decision Tree feature vector length: 12
Decision Tree predictions: [(16.0, XXXXXXXXXX), (40.0, XXXXXXXXXX), (32.0, XXXXXXXXXX), (13.0, XXXXXXXXXX), (1.0, XXXXXXXXXX)]
Decision Tree depth: 5
Decision Tree number of nodes: 63
Decision Tree - Mean Squared E
or: XXXXXXXXXX
Decision Tree - Mean Absolute E
or: XXXXXXXXXX
Decision Tree - Root Mean Squared Log E
or: 0.6251
Categorial features
Output:
Mapping of first categorical feature column: {'1': 0, '4': 1, '2': 2, '3': 3}
Categorical feature size mapping {0: 5, 1: 3, 2: 13, 3: 25, 4: 3, 5: 8, 6: 3, 7: 5}
Decision Tree Categorical Features - Mean Squared E
or: XXXXXXXXXX
Decision Tree Categorical Features - Mean Absolute E
or: XXXXXXXXXX
Decision Tree Categorical Features - Root Mean Squared Log E
or: 0.6192
Decision Tree Log
Output:
Decision Tree Log - Mean Squared E
or: XXXXXXXXXX
Decision Tree Log - Mean Absolute E
or: XXXXXXXXXX
Decision Tree Log - Root Mean Squared Log E
or: 0.6406
Decision Tree Max Bins
Output:
Decision Tree Max Depth
Output:
Task 2
Task 2 is compromised of developing:
1. Gradient boost tree
a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1)
) Gradient boost tree Max Bins (see Big-Data Assignment.docxsection 7.2)
c) Gradient boost tree Max Depth (see Big-Data Assignment.docx section 7.1)
Gradient Boosted Tree
Output:
GradientBoosted Trees predictions: [(16.0, XXXXXXXXXX), (40.0, XXXXXXXXXX), (32.0, XXXXXXXXXX), (13.0, XXXXXXXXXX), (1.0, XXXXXXXXXX)]
Gradient Boosted Trees - Mean Squared E
or = XXXXXXXXXX
Gradient Boosted Trees - Mean Absolute E
or = XXXXXXXXXX
Gradient Boosted Trees - Mean Root Mean Squared Log E
or = XXXXXXXXXX
Gradient boost tree iterations
Output:
Gradient boost tree Max Bins
Output:
Task 3
Task 3 is compromised of developing:
1. Linear regression model
a) Linear regression Cross Validation
i. Intercept (see Big-Data Assignment.docx section 6.5)
ii. Iterations (see Big-Data Assignment.docx section 6.1)
iii. Step size (see Big-Data Assignment.docx section 6.2)
iv. L1 Regularization (see Big-Data Assignment.docx section 6.4)
v. L2 Regularization (see Big-Data Assignment.docx section 6.3)
) Linear regression Log (see Big-Data Assignment.docx section 5.4)
Linear regression model
Output:
Mapping of first categorical feature column: {'1': 0, '4': 1, '2': 2, '3': 3}
Feature vector length for categorical features: 57
Feature vector length for numerical features: 4
Total feature vector length: 61
Linear Model feature vector:
[1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.24,0.2879,0.81,0.0]
Linear Model feature vector length: 61
Gradient Boosted Trees - Mean Root Mean Squared Log E
or = XXXXXXXXXX
Output 2:
Linear Model predictions: [(16.0, XXXXXXXXXX), (40.0, XXXXXXXXXX), (32.0, XXXXXXXXXX), (13.0, XXXXXXXXXX), (1.0, XXXXXXXXXX)]
Linear Regression - Mean Squared E
or: XXXXXXXXXX
Linear Regression - Mean Absolute E
or: XXXXXXXXXX
Linear Regression - Root Mean Squared Log E
or: 1.4284
Linear regression Cross Validation
Output:
Training data size: 13869
Test data size: 3510
Total data size: 17379
Train + Test size : 17379
Intercept
Output:
Iterations
Output:
Step size
Output:
L1 Regularization
Output:
L2 Regularization
Output:
Linear regression Log
Output:Linear Regression Log - Mean Squared E
or: XXXXXXXXXX
Linear Regression Log - Mean Absolute E
or: XXXXXXXXXX
Linear Regression Log - Root Mean Squared Log E
or: 1.5411
    6
    ICT707 Big Data aSSignment
    ICT707 Big Data aSSignment
    1

ICT112 Week 4 La
ICT707 Big Data Assignment
Regression Models
Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.
Regression models can be used to predict just about any variable of interest. A few examples include the following:
· Predicting stock returns and other economic variables
· Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
· Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
· Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns
In the different sections of this chapter, we will do the following:
Introduce the various types of regression models available in ML
· Explore feature extraction and target variable transformation for regression models
· Train a number of regression models using ML
· Building a Regression Model with Spark
· See how to make predictions using the trained model
· Investigate the impact on performance of various parameter settings for regression using cross-validation
Types of regression models
The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also refe
ed to as
Answered Same Day Sep 22, 2020 ICT707 University of the Sunshine Coast

Solution

Akash answered on Oct 08 2020
138 Votes
assign2/.DS_Store
__MACOSX/assign2/._.DS_Store
assign2/.ipynb_checkpoints
ike-checkpoint.ipyn
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path = \"/Users/priya/Desktop/Bike-Sharing-Dataset
ike.csv\"\n",
"data_df = sc.textFile(path)\n",
"data_count= data_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['1', '2011-01-01', '1', '0', '1', '0', '0', '6', '0', '1', '0.24', '0.2879', '0.81', '0', '3', '13', '16']\n"
]
}
],
"source": [
"data_rec = data_df.map(lambda x: x.split(\",\"))\n",
"first = data_rec.first()\n",
"print (first)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n"
]
}
],
"source": [
"print (data_count)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Now we have 17379 hourly records,we removed column name already by using unix command"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"sed 1d hour.csv > new_hour.csv We will ignore the record ID and raw date columns. \n",
"We will also ignore the casual and registered count target variables and focus on the \n",
"overall count variable, cnt (which is the sum of the other two counts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"from below command we are cache are data to use again"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PythonRDD[6] at RDD at PythonRDD.scala:48"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_rec.cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"now extract each catagorical variable into a binary vector form \n",
"Let's define a function that will extract this mapping from our dataset for a given column:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def get_mapping(rdd, idx):\n",
" return rdd.map(lambda fields: fields[idx]).distinct().zipWithIndex().collectAsMap()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"above function first map the all field to its unique values and uses the zipwithindex \n",
"transformation to performed key value rdd.\n",
"and key is the variable and value is the index\n",
"We can test our function on the third variable column (index 2):\n",
"so i am taking records is rdd and 2 is index of 3rd variable\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mapping of feature catagorical columns: {'1': 0, '4': 1, '2': 2, '3': 3}\n"
]
}
],
"source": [
"print(\"mapping of feature catagorical columns: %s\" %get_mapping(data_rec,2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"applying above function to each categorical column \n",
"for variable index from 2 to 9"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"mappings = [get_mapping(data_rec, i) for i in range(2,10)]\n",
"catagorical_len = sum(map(len, mappings))\n",
"num_len = len(data_rec.first()[11:15])\n",
"total_length = num_len + catagorical_len"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have to mappings for each variable, \n",
"and we can see how many values in total we need \n",
"for our binary vector representation:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature vector length for categorical features: 57\n",
"Feature vector length for numerical features: 4\n",
"Total feature vector length: 61\n"
]
}
],
"source": [
"print (\"Feature vector length for categorical features: %d\" % catagorical_len)\n",
"print (\"Feature vector length for numerical features: %d\" % num_len)\n",
"print (\"Total feature vector length: %d\" % total_length)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"creating feature vector for linear model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"again we calling mapping function to convert catagorical to binary-encoded features\n",
"import numpy for linear alge
a utilities and MLlib LabeledPoint class to wrap our feature vectors and target variables"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LabeledPoint\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def extract_feat(record):\n",
" catagorical_vector = np.zeros(catagorical_len)\n",
" j = 0\n",
" steps = 0\n",
" for fields in record[2:9]:\n",
" mapp = mappings[j]\n",
" idx = mapp[fields]\n",
" catagorical_vector[idx + steps] = 1\n",
" j = j + 1\n",
" steps = steps + len(mapp)\n",
" number_vector = np.a
ay([float(field) for field in record[10:14]])\n",
" return np.concatenate((catagorical_vector, number_vector))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def ex_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ex_features function, we cross through each column in the row of data. \n",
"We find the binary encoding for each single variable in every turn \n",
"from the mappings we created previously\n",
"The step variable ensures that the nonzero feature index in the full feature vector is co
ect\n",
"(and is somewhat more efficient than, say, creating many smaller binary vectors and \n",
" concatenating them). The numeric vector is created directly by first converting the data \n",
"to floating point numbers and wrapping these in a numpy a
ay. The resulting two vectors \n",
"are then concatenated. The extract_label function simply converts the last column variable \n",
"(the count) into a float. With our utility functions defined, we can proceed with extracting \n",
"feature vectors and labels from our data records:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"data = data_rec.map(lambda r: LabeledPoint(ex_label(r), extract_feat(r)))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Raw data: ['1', '0', '1', '0', '0', '6', '0', '1', '0.24', '0.2879', '0.81', '0', '3', '13', '16']\n",
"Label: 16.0\n",
"Linear Model feature vector:\n",
"[1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.24,0.2879,0.81,0.0]\n",
"Linear Model feature vector length: 61\n"
]
}
],
"source": [
"first_point = data.first()\n",
"print (\"Raw data: \" + str(first[2:]))\n",
"print (\"Label: \" + str(first_point.label))\n",
"print (\"Linear Model feature vector:\\n\" + str(first_point.features))\n",
"print (\"Linear Model feature vector length: \" + str(len(first_point.features)))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LinearRegressionWithSGD"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"linear_model = LinearRegressionWithSGD.train(data, iterations=10,step=0.1, intercept=False)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model predictions: [(16.0, 117.89250386724846), (40.0, 116.2249612319211), (32.0, 116.02369145779235), (13.0, 115.67088016754433), (1.0, 115.56315650834317)]\n"
]
}
],
"source": [
"true_vs_predicted = data.map(lambda p: (p.label, linear_model.predict(p.features)))\n",
"print (\"Linear Model predictions: \" + str(true_vs_predicted.take(5)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model - Mean Squared E
or: 30679.4539\n"
]
}
],
"source": [
"li=[]\n",
"for i in true_vs_predicted.collect():\n",
" true,pred=i[0],i[1]\n",
" val=(pred - true)**2\n",
" li.append(val)\n",
"lenth=len(li)\n",
"su=sum(li)\n",
"mean=su/lenth\n",
"print (\"Linear Model - Mean Squared E
or: %2.4f\" % mean)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"targets = data_rec.map(lambda r: float(r[-1])).collect()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import pylab"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
},
{
"name": "stde
",
"output_type": "stream",
"text": [
"/anaconda3/li
python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clo
ered these variables: ['mean', 'pylab']\n",
"`%matplotlib` prevents importing * from pylab and numpy\n",
" \"\\n`%matplotlib` prevents importing * from pylab and numpy\"\n"
]
}
],
"source": [
"%pylab inline"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+kAAAJ4CAYAAAAZcKItAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X/sZXV+1/HX25lCu6uyLTsaFtCZhnF0aNIffoOrNY2WWg
dAyh6aBVUjH4B9jWHzHgH1VJNpHEFGtkm5CFitjugDjVSbMp/UFNNVHgS7faHeg3foVapozdqVDqjxQcfPvHPazffvnOfO8wl/l+7szjkRDu/dzPOfeczdnLPOece251dwAAAICd97t2egMAAACAGZEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMIjdO70B5+PjH/947927d6c3AwAAAM7Liy+++BvdvWe7eUsV6Xv37s3q6upObwYAAACcl6r6
PMc7k7AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCB27/QGXKqOrZ1a2LpuO3DNwtYFAADAuJxJBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABjEXJFeVYeqaq2q1qvqvi1ev7Kqnpxef66q9m547f5pfK2qbtkw/ter6kRVfaGqPltVX76IHQIAAIBltW2kV9WuJA8nuTXJwSR3VNXBTdPuSvJmd9+Q5KEkD07LHkxyJMmNSQ4l+XRV7aqqa5N8b5KV7v6aJLumeQAAAHDZmudM+k1J1rv7le5+J8nRJIc3zTmc5PHp8dNJbq6qmsaPdvfb3f1qkvVpfUmyO8lXVNXuJB9J8vqF7QoAAAAst3ki/dokr214fnIa23JOd59J8laSq8+2bHf/WpJ/mORXk5xK8lZ3/9RWb15Vd1fValWtnj59eo7NBQAAgOU0T6TXFmM955wtx6vqKzM7y74vySeSfLSqvnurN+/uR7p7pbtX9uzZM8fmAgAAwHKaJ9JPJrl+w/Pr8v5L0780Z7p8/aokb5xj2W9J8mp3n+7u/5PkWJI/8UF2AAAAAC4V80T6C0n2V9W+qroisxu8Hd8053iSO6fHtyd5trt7Gj8y3f19X5L9SZ7P7DL3T1bVR6bvrt+c5OUL3x0AAABYXru3m9DdZ6rq3iTPZHYX9se6+0RVPZBktbuPJ3k0yRNVtZ7ZGfQj07InquqpJC8lOZPknu5+N8lzVfV0kl+Yxj+f5JHF7x4AAAAsj5qd8F4OKysrvbq6utObMZdja6cWtq7bDlyzsHUBAABw8VXVi929st28eS53BwAAAC4CkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMIi5Ir2qDlXVWlWtV9V9W7x+ZVU9Ob3+XFXt3fDa/dP4WlXdMo0dqKpf3PDPb1XV9y9qpwAAAGAZ7d5uQlXtSvJwkj+T5GSSF6rqeHe/tGHaXUne7O4bqupIkgeTfFdVHUxyJMmNST6R5Geq6g9191qSr9uw/l9L8uML3C8AAABYOvOcSb8pyXp3v9Ld7yQ5muTwpjmHkzw+PX46yc1VVdP40e5+u7tfTbI+rW+jm5P8l+7+rx90JwAAAOBSME+kX5vktQ3PT05jW87p7jNJ3kpy9ZzLHkny2bO9eVXdXVWrVbV6+vTpOTYXAAAAltM8kV5bjPWcc865bFVdkeQ7kvyLs715dz/S3SvdvbJnz545NhcAAACW0zyRfjLJ9RueX5fk9bPNqardSa5K8sYcy96a5Be6+9fPb7MBAADg0jNPpL+QZH9V7ZvOfB9JcnzTnONJ7pwe357k2e7uafzIdPf3fUn2J3l+w3J35ByXugMAAMDlZNu7u3f3maq6N8kzSXYleay7T1TVA0lWu/t4kkeTPFFV65mdQT8yLXuiqp5K8lKSM0nu6e53k6SqPpLZHeP/6oewXwAAALB0to30JOnuzyX53KaxH9jw+LeTfOdZlv1Ukk9tMf6/M7u5HAAAAJD5LncHAAAALgKRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAg5gr0qvqUFWtVdV6Vd23xetXVtWT0+vPVdXeDa/dP42vVdUtG8Y/VlVPV9UvV9XLVfXHF7FDAAAAsKy2jfSq2pXk4SS3JjmY5I6qOrhp2l1J3uzuG5I8lOTBadmDSY4kuTHJoSSfntaXJD+U5Ce7+w8n+dokL1/47gAAAMDymudM+k1J1rv7le5+J8nRJIc3zTmc5PHp8dNJbq6qmsaPdvfb3f1qkvUkN1XV703yTUkeTZLufqe7f/PCdwcAAACW1zyRfm2S1zY8PzmNbTmnu88keSvJ1edY9quTnE7yI1X1+ar6TFV9dKs3r6q7q2q1qlZPnz49x+YCAADAcpon0muLsZ5zztnGdyf5hiQ/3N1fn+R/JXnfd92TpLsf6e6V7l7Zs2fPHJsLAAAAy2meSD+Z5PoNz69L8vrZ5lTV7iRXJXnjHMueTHKyu5+bxp/OLNoBAADgsjVPpL+QZH9V7auqKzK7EdzxTXOOJ7lzenx7kme7u6fxI9Pd3/cl2Z/k+e7+b0leq6oD0zI3J3npAvcFAAAAltru7SZ095mqujfJM0l2JXmsu09U1QNJVrv7eGY3gHuiqtYzO4N+ZFr2RFU9lVmAn0lyT3e/O636ryX50Sn8X0nyPQveNwAAAFgqNTvhvRxWVlZ6dXV1pzdjLsfWTi1sXbcduGZh6wIAAODiq6oXu3tlu3nzXO4OAAAAXAQiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgEHNFelUdqqq1qlqvqvu2eP3Kqnpyev25qtq74bX7p/G1qrplw/ivVNUvVdUvVtXqInYGAAAAltnu7SZU1a4kDyf5M0lOJnmhqo5390sbpt2V5M3uvqGqjiR5MMl3VdXBJEeS3JjkE0l+pqr+UHe/Oy33p7v7Nxa4PwAAALC05jmTflOS9e5+pbvfSXI0yeFNcw4neXx6/HSSm6uqpvGj3f12d7+aZH1aHwAAALDJPJF+bZLXNjw/OY1tOae7zyR5K8nV2yzbSX6qql6sqrvP9uZVdXdVrVbV6unTp+fYXAAAAFhO80R6bTHWc84517Lf2N3fkOTWJPdU1Tdt9ebd/Uh3r3T3yp49e+bYXAAAAFhO80T6ySTXb3h+XZLXzzanqnYnuSrJG+datrvf+/cXk/x4XAYPAADAZW6eSH8hyf6q2ldVV2R2I7jjm+YcT3Ln9Pj2JM92d0/jR6a7v+9Lsj/J81X10ar6PUlSVR9N8q1JvnDhuwMAAADLa9u7u3f3maq6N8kzSXYleay7T1TVA0lWu/t4kkeTPFFV65mdQT8yLXuiqp5K8lKSM0nu6e53q+r3J/nx2b3lsjvJj3X3T34I+wcAAABLY9tIT5Lu/lySz20a+4ENj387yXeeZdlPJfnUprFXknzt+W4sAAAAXMrmudwdAAAAuAhEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIOaK9Ko6VFVrVbVeVfdt8fqVVfXk9PpzVbV3w2v3T+NrVXXLpuV2VdXnq+onLnRHAAAAYNltG+lVtSvJw0luTXIwyR1VdXDTtLuSvNndNyR5KMmD07IHkxxJcmOSQ0k+Pa3vPd+X5OUL3QkAAAC4FMxzJv2mJOvd/Up3v5PkaJLDm+YcTvL49PjpJDdXVU3jR7v77e5+Ncn6tL5U1XVJvi3JZy58NwAAAGD5zRPp1yZ5bcPzk9PYlnO6+0ySt5Jcvc2y/yjJ307yf8/15lV1d1WtVtXq6dOn59hcAAAAWE7zRHptMdZzztlyvKq+PckXu/vF7d68ux/p7pXuXtmzZ8/2WwsAAABLap5IP5nk+g3Pr0vy+tnmVNXuJFcleeMcy35jku+oql/J7PL5b66qf/4Bth8AAAAuGfNE+gtJ9lfVvqq6IrMbwR3fNOd4kjunx7cneba7exo/Mt39fV+S/Ume7+77u/u67t47re/Z7v7uBewPAAAALK3d203o7jNVdW+SZ5LsSvJYd5+oqgeSrHb38SSPJnmiqtYzO4N+ZFr2RFU9leSlJGeS3NPd735I+wIAAABLrWYnvJfDyspKr66u7vRmzOXY2qmFreu2A9csbF0AAABcfFX1YnevbDdvnsvdAQAAgItApAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgdu/0BrC9Y2unFrau2w5cs7B1AQAAsFjOpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMYq5Ir6pDVbVWVetVdd8Wr19ZVU9Orz9XVXs3vH
NL5WVbdMY19eVc9X1X+sqhNV9fcXtUMAAACwrLaN9KraleThJLcmOZjkjqo6uGnaXUne7O4bkjyU5MFp2YNJjiS5McmhJJ+e1vd2km/u7q9N8nVJDlXVJxezSwAAALCc5jmTflOS9e5+pbvfSXI0yeFNcw4neXx6/HSSm6uqpvGj3f12d7+aZD3JTT3zP6f5Xzb90xe4LwAAALDU5on0a5O8tuH5yWlsyzndfSbJW0muPteyVbWrqn4xyReT/HR3P7fVm1fV3VW1WlWrp0+fnmNzAQAAYDnNE+m1xdjms95nm3PWZbv73e7+uiTXJbmpqr5mqzfv7ke6e6W7V
s2TPH5gIAAMBymifSTya5fsPz65K8frY5VbU7yVVJ3phn2e7+zST/JrPvrAMAAMBla55IfyHJ/qraV1VXZHYjuOOb5hxPcuf0+PYkz3Z3T+NHpru/70uyP8nzVbWnqj6WJFX1FUm+JckvX/juAAAAwPLavd2E7j5TVfcmeSbJriSPdfeJqnogyWp3H0/yaJInqmo9szPoR6ZlT1TVU0leSnImyT3d/W5VXZPk8elO778ryVPd/RMfxg4CAADAsqjZCe/lsLKy0qurqzu9GXM5tnZqpzdhS7cduGanNwEAAOCyU1UvdvfKdvPmudwdAAAAuAhEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIHbv9AZwcR1bO7Wwdd124JqFrQsAAABn0gEAAGAYIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBC7d3oDIEmOrZ1a6PpuO3DNQtcHAABwMTiTDgAAAIMQ6QAAADAIkQ4AAACD8J10PrBFf48cAADgcifSuSQt8i8Q3IQOAAC4WFzuDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwiN07vQFwOTm2dmph67rtwDULWxcAADAGZ9IBAABgECIdAAAABiHSAQAAYBC+kw7bWOT3yAEAAM7FmXQAAAAYhEgHAACAQbjcHZaUn3MDAIBLz1xn0qvqUFWtVdV6Vd23xetXVtWT0+vPVdXeDa/dP42vVdUt09j1VfVzVfVyVZ2oqu9b1A4BAADAsto20qtqV5KHk9ya5GCSO6rq4KZpdyV5s7tvSPJQkgenZQ8mOZLkxiSHknx6Wt+ZJH+zu/9Ikk8muWeLdQIAAMBlZZ4z6TclWe/uV7r7nSRHkxzeNOdwksenx08nubmqaho/2t1vd/erSdaT3NTdp7r7F5Kku/9HkpeTXHvhuwMAAADLa55IvzbJaxuen8z7g/pLc7r7TJK3klw9z7LTpfFfn+S5rd68qu6uqtWqWj19+vQcmwsAAADLaZ5Iry3Ges4551y2qn53kn+Z5Pu7+7e2evPufqS7V7p7Zc+ePXNsLgAAACyneSL9ZJLrNzy/LsnrZ5tTVbuTXJXkjXMtW1Vfllmg/2h3H/sgGw8AAACXknki/YUk+6tqX1VdkdmN4I5vmnM8yZ3T49uTPNvdPY0fme7+vi/J/iTPT99XfzTJy939g4vYEQAAAFh22/5Oenefqap7kzyTZFeSx7r7RFU9kGS1u49nFtxPVNV6ZmfQj0zLnqiqp5K8lNkd3e/p7ner6k8m+YtJfqmqfnF6q7/T3Z9b9A4CAADAstg20pNkiufPbRr7gQ2PfzvJd55l2U8l+dSmsX+X
+vDgAAAJeteS53BwAAAC4CkQ4AAACDEOkAAAAwiLm+kw4wr2Nrpxa2rtsOXLOwdQEAwDJwJh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQfoINWOjPpgEAAB+cM+kAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACD2L3TGwBwNsfWTi10fbcduGah6wMAgEVzJh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBBuHAdcNhZ5Izo3oQMA4MPgTDoAAAAMwpl0gA/AWXkAAD4MzqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMYvdObwDA5e7Y2qmFreu2A9csbF0AAFx8zqQDAADAIEQ6AAAADEKkAwAAwCDmivSqOlRVa1W1XlX3bfH6lVX15PT6c1W1d8Nr90/ja1V1y4bxx6rqi1X1hUXsCAAAACy7bSO9qnYleTjJrUkOJrmjqg5umnZXkje7+4YkDyV5cFr2YJIjSW5McijJp6f1Jck/ncYAAACAzHcm/aYk6939Sne/k+RoksOb5hxO8vj0+OkkN1dVTeNHu/vt7n41yfq0vnT3zyd5YwH7AAAAAJeEeSL92iSvbXh+ch
ck53n0nyVpKr51wWAAAAyHyRXluM9Zxz5ln23G9edXdVrVbV6unTp89nUQAAAFgq80T6ySTXb3h+XZLXzzanqnYnuSqzS9nnWfacuvuR7l7p7pU9e/acz6IAAACwVHbPMeeFJPural+SX8vsRnB/ftOc40nuTPLvk9ye5Nnu7qo6nuTHquoHk3wiyf4kzy9q4wH4nY6tnVrYum47cM3C1gUAwHy2PZM+fcf83iTPJHk5yVPdfaKqHqiq75imPZrk6qpaT/I3ktw3LXsiyVNJXkryk0nu6e53k6SqPptZ1B+oqpNVdddidw0AAACWS3Wf11fEd9TKykqvrq7u9GbMZZFnswB2gjPpAACLU1UvdvfKdvPm+U46AAAAcBGIdAAAABiESAcAAIBBiHQAAAAYxDw/wQYAF2TRN9N0UzsA4FIl0gHYkl+pAAC4+FzuDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADCI3Tu9AQBwvo6tnVrYum47cM3C1gUAcKGcSQcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBB+J10AC5rfnMdABiJM+kAAAAwCJEOAAAAg3C5OwAsiEvnAYAL5Uw6AAAADMKZdAC4xDnDDwDLQ6QDwIAWGdYAwPJwuTsAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAzC3d0BgLkt+q7zftINAH4nZ9IBAABgECIdAAAABuFydwBgxyzy8nmXzgNwKXAmHQAAAAYh0gEAAGAQLncHAC4Ji77z/KK4DB+A8+FMOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAINw4DgDgQ+S34AE4H86kAwAAwCCcSQcAWBLOygNc+pxJBwAAgEE4kw4AcBla5Fn5xJl5gEUR6QAAXLBFR/+i+MsDYNm43B0AAAAGIdIBAABgECIdAAAABuE76QAAXLL8bB2wbEQ6AADMQfCfP/+bwfkT6QAAcJGJV+BsRDoAAPAlo/6cn
Y4HIh0gEAgMuK4GdkIh0AAJbYqGe+gQ9mrkivqkNJfijJriSf6e5/sOn1K5P8syR/NMl/T/Jd3f0r02v3J7krybtJvre7n5lnnQAAAKMb+S9JnOVfTtv+TnpV7UrycJJbkxxMckdVHdw07a4kb3b3DUkeSvLgtOzBJEeS3JjkUJJPV9WuOdcJAAAAl5V5zqTflGS9u19Jkqo6muRwkpc2zDmc5O9Nj59O8k+qqqbxo939dpJXq2p9Wl/mWCcAAAAf0Mhn+RflUrxaYJ5IvzbJaxuen0zyx842p7vPVNVbSa6exv/DpmWvnR5vt84kSVXdneTu6en
Kq1ObYtPB6yAAAF1ElEQVR5p308yW/s9EbABXAMs8wcvyw7xzDLzjHMMvswj98/OM+keSK9thjrOeecbXyry+w3r3M22P1IkkfOtYGjqarV7l7Z6e2AD8oxzDJz/LLsHMMsO8cwy2yE43fb76Rndpb7+g3Pr0vy+tnmVNXuJFcleeMcy86zTgAAALiszBPpLyTZX1X7quqKzG4Ed3zTnONJ7pwe357k2e7uafxIVV1ZVfuS7E/y/JzrBAAAgMvKtpe7T98xvzfJM5n9XNpj3X2iqh5Istrdx5M8muSJ6cZwb2QW3ZnmPZXZDeHOJLmnu99Nkq3Wufjd2zFLdXk+bMExzDJz/LLsHMMsO8cwy2zHj9+anfAGAAAAdto8l7sDAAAAF4FIBwAAgEGI9AWqqkNVtVZV61V1305vD2ylqq6vqp+rqper6kRVfd80/lVV9dNV9Z+nf3/lNF5V9Y+n4/o/VdU37OweQFJVu6rq81X1E9PzfVX13HT8PjndlDTTjUufnI7f56pq705uNyRJVX2sqp6uql+ePov/uM9glklV/fXpzxBfqKrPVtWX+xxmZFX1WFV9saq+sGHsvD93q+rOaf5
qo7t3qvRRDpC1JVu5I8nOTWJAeT3FFVB3d2q2BLZ5L8ze7+I0k+meSe6Vi9L8nPdvf+JD87PU9mx/T+6Z+7k/zwxd9keJ/vS/LyhucPJnloOn7fTHLXNH5Xkje7+4YkD03zYKf9UJKf7O4/nORrMzuWfQazFKrq2iTfm2Slu78ms5tAH4nPYcb2T5Mc2jR2Xp+7VfVVSf5ukj+W5KYkf/e9sF80kb44NyVZ7+5XuvudJEeTHN7hbYL36e5T3f0L0+P/kdkfDq/N7Hh9fJr2eJI/Nz0+nOSf9cx/SPKxq
mIm82fElVXZfk25J8ZnpeSb45ydPTlM3H73vH9dNJbp7mw46oqt+b5Jsy+2WcdPc73f2b8RnMctmd5CuqaneSjyQ5FZ/DDKy7fz6zXyHb6Hw/d29J8tPd/UZ3v5nkp/P+8F8Ikb441yZ5bcPzk9MYDGu65OzrkzyX5Pd396lkFvJJft80zbHNaP5Rkr+d5P9Oz69O8pvdfWZ6vvEY/dLxO73+1jQfdspXJzmd5Eemr2x8pqo+Gp/BLInu
Uk/zDJr2YW528leTE+h1k+5/u5e9E+j0X64mz1N4J+345hVdXvTvIvk3x/d
WuaZuMebYZkdU1bcn+WJ3v7hxeIupPcdrsBN2J/mGJD/c3V+f5H/l/19iuRXHMEOZLu89nGRfkk8k+Whmlwdv5nOYZXW2Y/aiHcsifXFOJrl+w/Prkry+Q9sC51RVX5ZZoP9odx+bhn/9vUsop39/cRp3bDOSb0zyHVX1K5l9reibMzuz
Hpssvkdx6jXzp+p9evyvsvd4OL6WSSk9393PT86cyi3Wcwy+Jbkrza3ae7+/8kOZbkT8TnMMvnfD93L9rnsUhfnBeS7J/ubHlFZjfQOL7D2wTvM30P7NEkL3f3D2546XiS9+5SeWeSf71h/C9Nd7r8ZJK33rs0CC627r6/u6
7r2Zfc4+291/IcnPJbl9m
5+H3vuL59mu8MDjumu/9bkteq6sA0dHOSl+IzmOXxq0k+WVUfmf5M8d4x7HOYZXO+n7vPJPnWqvrK6YqSb53GFq78f2RxqurPZnZGZ1eSx7r7Uzu8SfA+VfUnk/zbJL+U
+d3r+T2ffSn0ryBzL7D/B3dvcb03+A/0lmN8b430m+p7tXL/qGwyZV9aeS/K3u/vaq+urMzqx/VZLPJ/nu7n67qr48yROZ3XvhjSRHuvuVndpmSJKq+
Mbnx4RZJXknxPZidOfAazFKrq7yf5rsx+MebzSf5KZt/N9TnMkKrqs0n+VJKPJ/n1zO7S/q9ynp+7VfWXM/tzc5J8qrt/5EPZXpEOAAAAY3C5OwAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAzi/wHCEC8MiF7HCQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"hist(targets, bins=45, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(17, 11)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA6sAAAJCCAYAAAAm3lF7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+s3fdd3/HXG5uUUSgUYk1enBAXjEX4oYZd0k0VZaJp6woUV1Er3IkpTJWyTs0o6qaRDpRqQZVKkfjxRxiNWqPCKF5pjWQhs6yjLRtiob5pC51T7uqY0tzZWw3p6Dogwel7f9wTdHpzHR/XJz4fn/t4SFbO93s+32/eV0dV+vT3e763ujsAAAAwkq9Y9AAAAACwmVgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABjOzkUPsNm1117bN95446LHAAAA4Fnw0EMP/Vl377rYuuFi9cY
8zq6uqixwAAAOBZUFV/Oss6twEDAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcGaK1ao6UFVrVXWqqu5+hnWvrqquqpWpfW+eHLdWVa+Yx9AAAAAst50XW1BVO5Lcl+RlSdaTnKiqY9398KZ1X5vkR5P8wdS+m5IcSvLtSf5ekv9cVd/a3U/O70cAAABg2cxyZfWWJKe6+3R3P5HkSJKDW6z7qSRvT/LXU/sOJjnS3Y93958kOTU5HwAAAFzQLLF6XZJHp7bXJ/v+VlXdnOT67v6tSz0WAAAANpslVmuLff23b1Z9RZKfS/IvL/XYqXPcWVWrVbV67ty5GUYCAABgmc0Sq+tJrp/a3pPkzNT21yb5jiQfrqpPJ/kHSY5NHrJ0sWOTJN19f3evdPfKrl27Lu0nAAAAYOnMEqsnkuyrqr1VdU02Hph07Kk3u/svuvva7r6xu29M8mCS27p7dbLuUFU9p6r2JtmX5CNz/ykAAABYKhd9GnB3n6+qu5I8kGRHksPdfbKq7k2y2t3HnuHYk1X13iQPJzmf5A2eBAwAAMDFVPfTvkK6UCsrK726u
oMQAAAHgWVNVD3b1ysXWz3AYMAAAAV9RFbwMGAIBldHTt7FzPd/v+3XM9H2x3rqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMPZuegBAABgFkfXzi56BOAKcmUVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhrNz0QMAAMAyOLp2du7nvH3/7rmfE64WrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxnplitqgNVtVZVp6rq7i3ef31VfaKqPl5Vv1dVN03231hVfzXZ
Gq+qV5/wAAAAAsn50XW1BVO5Lcl+RlSdaTnKiqY9398NSy93T3L03W35bkZ5McmLz3SHe/cL5jAwAAsMxmubJ6S5JT3X26u59IciTJwekF3f35qc3nJun5jQgAAMB2M0usXpfk0ant9cm+L1FVb6iqR5K8PcmPTr21t6o+VlW/W1Xfu9W/oKrurKrVqlo9d+7cJYwPAADAMpolVmuLfU+7ctrd93X3Nyf58SQ/Odl9NskN3X1zkjcleU9VPW+LY+/v7pXuXtm1a9fs0wMAALCUZonV9STXT23vSXLmGdYfSfKqJOnux7v7zyevH0rySJJv/fJGBQAAYLuYJVZPJNlXVXur6pokh5Icm15QVfumNn8gyacm+3dNHtCUqnpBkn1JTs9jcAAAAJbXRZ8G3N3nq+quJA8k2ZHkcHefrKp7k6x297Ekd1XVrUn+JsnnktwxOfwlSe6tqvNJnkzy+u5+7Nn4QQAAAFgeF43VJOnu40mOb9p3z9TrN17guPcnef/lDAgAAMD2M8ttwAAAAHBFiVUAAACGI1YBAAAYjlgFAABgODM9YAkAALjyjq6dnev5bt+/e67ng2eTK6sAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADMfTgAEAmLt5P8UW2H5cWQUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDh7Fz0AAAAwJVxdO3sXM93+/7dcz0fTHNlFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGM1OsVtWBqlqrqlNVdfcW77++qj5RVR+vqt+rqpum3nvz5Li1qnrFPIcHAABgOV00VqtqR5L7krwyyU1JXjsdoxPv6e7v7O4XJnl7kp+dHHtTkkNJvj3JgSS/ODkfAAAAXNDOGdbckuRUd59Okqo6kuRgkoefWtDdn59a/9wkPXl9MMmR7n48yZ9U1anJ+f7bHGYHAGBOjq6dXfQIAF9illi9LsmjU9vrSV60eVFVvSHJm5Jck+T7p459cNOx131ZkwIAALBtzPKd1dpiXz9tR/d93f3NSX48yU9eyrFVdWdVrVbV6rlz52YYCQAAgGU2S6yuJ7l+antPkjPPsP5IklddyrHdfX93r3T3yq5du2YYCQAAgGU2S6yeSLKvqvZW1TXZeGDSsekFVbVvavMHknxq8vpYkkNV9Zyq2ptkX5KPXP7YAAAALLOLfme1u89X1V1JHkiyI8nh7j5ZVfcmWe3uY0nuqqpbk/xNks8luWNy7Mmqem82HsZ0PskbuvvJZ+lnAQAAYEnM8oCldPfxJMc37btn6vUbn+HYtyZ565c7IAAAANvPLLcBAwAAwBUlVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDh7Fz0AAAAXLqja2cXPQLAs8qVVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhzBSrVXWgqtaq6lRV3b3F+2+qqoer6o+q6neq6pum3nuyqj4++XNsnsMDAACwnHZebEFV7UhyX5KXJVlPcqKqjnX3w1PLPpZkpbv/sqr+eZK3J/mhyXt/1d0vnPPcAAAALLFZrqzekuRUd5/u7ieSHElycHpBd3+ou/9ysvlgkj3zHRMAAIDtZJZYvS7Jo1Pb65N9F/K6JL89tf1VVbVaVQ9W1au2OqCq7pysWT137twMIwEAALDMLnobcJLaYl9vubDqh5OsJPm+qd03dPeZqnpBkg9W1Se6+5EvOVn3/UnuT5KVlZUtzw0AAMD2McuV1fUk109t70lyZvOiqro1yU8kua27H39qf3efmfzzdJIPJ7n5MuYFAABgG5glVk8k2VdVe6vqmiSHknzJU32r6uYk78hGqH52av/zq+o5k9fXJnlxkukHMwEAAMDTXPQ24O4+X1V3JXkgyY4kh7v7ZFXdm2S1u48l+ZkkX5PkN6oqST7T3bcl+bYk76iqL2YjjN+26SnCAAAA8DSzfGc13X08yfFN++6Zen3rBY77/STfeTkDAgAAsP3MchswAAAAXFEzXVkFAADY7Oja2bmf8
9u+d+Tq5OrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAw9m56AEAALaDo2tnFz0CwFXFlVUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIazc9EDAAAAPOXo2tm5nu/2
vnej6uHFdWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYzU6xW1YGqWquqU1V19xbvv6mqHq6qP6qq36mqb5p6746q+tTkzx3zHB4AAIDldNFYraodSe5L8sokNyV5bVXdtGnZx5KsdPd3JXlfkrdPjv2GJG9J8qIktyR5S1U9f37jAwAAsIxmubJ6S5JT3X26u59IciTJwekF3f2h7v7LyeaDSfZMXr8iyQe6+7Hu/lySDyQ5MJ/RAQAAWFazxOp1SR6d2l6f7LuQ1yX57S/zWAAAAMjOGdbUFvt6y4VVP5xkJcn3XcqxVXVnkjuT5IY
phhJAAAAJbZLFdW15NcP7W9J8mZzYuq6tYkP5Hktu5+/FKO7e77u3ulu1d27do16+wAAAAsqVli9USSfVW1t6quSXIoybHpBVV1c5J3ZCNUPzv11gNJXl5Vz588WOnlk30AAABwQRe9Dbi7z1fVXdmIzB1JDnf3yaq6N8lqdx9L8jNJvibJb1RVknymu2
7seq6qeyEbxJcm93P/as/CQAAAAsjVm+s5ruPp7k+KZ990y9vvUZjj2c5PCXOyAAAADbzyy3AQMAAMAVJVYBAAAYjlgFAABgOGIVAACA4YhVAAAAhjPT04ABALaTo2tnFz0CwLbnyioAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMPZuegBAAAu19G1s4seAYA5c2UVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOHsnGVRVR1I8gtJdiR5Z3e
dP7L0ny80m+K8mh7n7f1HtPJvnEZPMz3X3bPAYHAK5eR9fOLnoEAAZ30Vitqh1J7kvysiTrSU5U1bHufnhq2WeS/EiSf7XFKf6qu184h1kBAADYJma5snpLklPdfTpJqupIkoNJ/jZWu/vTk/e++CzMCAAAwDYzy3dWr0vy6NT2+mTfrL6qqlar6sGqetVWC6rqzsma1XPnzl3CqQEAAFhGs8Rq
GvL+HfcUN3ryT5x0l+vqq++Wkn676/u1e6e2XXrl2XcGoAAACW0Syxup7k+qntPUnOzPov6O4zk3+eTvLhJDdfwnwAAABsQ7PE6okk+6pqb1Vdk+RQkmOznLyqnl9Vz5m8vjbJizP1XVcAAADYykVjtbvPJ7kryQNJPpnkvd19sqrura
kqSqvqeq1pO8Jsk7qurk5PBvS7JaVX+Y5ENJ3
pKcIAAADwNDP9ntXuPp7k+KZ990y9PpGN24M3H/f7Sb7zMmcEAABgm5nlNmAAAAC4osQqAAAAwxGrAAAADEesAgAAMJyZHrAEAABwNTq6dnau57t9/+65no8Lc2UVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4exc9AAAwNiOrp1d9AgAbEOurAIAADAcsQoAAMBwxCoAAADD8Z1VAFiweX8n9Pb9u+d6PgBYBFdWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYzs5FDwAAV5uja2cXPcIzGn0+AJiFK6sAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMJydix4AAJ5NR9fOLnoEAODL4MoqAAAAw3FlFYChuBIKACSurAIAADAgV1aBbW3eV/Fu3797rucDANiuZrqyWlUHqmqtqk5V1d1bvP+SqvpoVZ2vqldveu+OqvrU5M8d8xocAACA5XXRWK2qHUnuS/LKJDcleW1V3bRp2WeS/EiS92w69huSvCXJi5LckuQtVfX8yx8bAACAZTbLldVbkpzq7tPd/USSI0kOTi/o7k939x8l+eKmY1+R5APd/Vh3fy7JB5IcmMPcAAAALLFZvrN6XZJHp7bXs3GldBZbHXvdjMcCXHV8BxYAYD5mubJaW+zrGc8/07FVdWdVrVbV6rlz52Y8NQAAAMtqllhdT3L91PaeJGdmPP9Mx3b3/d290t0ru3btmvHUAAAALKtZYvVEkn1VtbeqrklyKMmxGc
QJKXV9XzJw9WevlkHwAAAFzQRWO1u88nuSsbkfnJJO/t7pNVdW9V3ZYkVfU9VbWe5DVJ3lFVJyfHPpbkp7IRvCeS3DvZBwAAABc0ywOW0t3HkxzftO+eqdcnsnGL71bHHk5y+DJmBAAAYJuZ5TZgAAAAuKJmurIKABcy71/XAwCQuLIKAADAgMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADD2bnoAYBxHF07O9fz3b5/91zPBwDA9uHKKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAw9m56AGA5XV07excz3f7/t1zPR8AAOMSqwADE/wAwHblNmAAAACG48oqXKXmfcUNAICLc9fTlePKKgAAAMMRqwAAAAzHbcBwAW7xAACAxXFlFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACG41fXsBB+LQwAAPBMXFkFAABgOGIVAACA4YhVAAAAhuM7q8BVY97fdQYAYFyurAIAADAcsQoAAMBwxCoAAADDEasAAAAMxwOWALYRD6kCAK4WrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMZ+csi6rqQJJfSLIjyTu7+22b3n9Okl9J8veT/HmSH+ruT1fVjUk+mWRtsvTB7n79fEZfnKNrZ+d6vtv3757r+QAAAK52F43VqtqR5L4kL0uynuREVR3r7oenlr0uyee6+1uq6lCSn07yQ5P3HunuF855bgAAAJbYLLcB35LkVHef7u4nkhxJcnDTmoNJ3j15
4kL62qmt+YAAAAbCezxOp1SR6d2l6f7NtyTXefT/IXSb5x8t7eqvpYVf1uVX3vVv+CqrqzqlaravXcuXOX9AMAAACwfGaJ1a2ukPaMa84muaG7b07ypiTvqarnPW1h9/3dvdLdK7t27ZphJAAAAJbZLLG6nuT6qe09Sc5caE1V7UzydUke6+7Hu/vPk6S7H0rySJJvvdyhAQAAWG6zxOqJJPuqam9VXZPkUJJjm9YcS3LH5PWrk3ywu7uqdk0e0JSqekGSfUlOz2d0AAAAltVFnwbc3eer6q4kD2TjV9cc7u6TVXVvktXuPpbkXUl+tapOJXksG0GbJC9Jcm9VnU/yZJLXd/djz8YPAgAAwPKY6fesdvfxJMc37btn6vVfJ3nNFse9P8n7L3NGAAAAtplZbgMGAACAK0qsAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADGfnogcAAADYro6unZ37OW/fv3vu51wEV1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDgzxWpVHaiqtao6VVV3
H+c6rqP0ze/4OqunHqvTdP9q9V1SvmNzoAAADL6qKxWlU7ktyX5JVJbkry2qq6adOy1yX5XHd/S5KfS/LTk2NvSnIoybcnOZDkFyfnAwAAgAua5crqLUlOdffp7n4iyZEkBzetOZjk3ZPX70vy0qqqyf4j3f14d/9JklOT8wEAAMAFzRKr1yV5dGp7fbJvyzXdfT7JXyT5xhmPBQAAgC+xc4Y1tcW+nnHNLMemqu5Mcudk8wtVtTbDXIt0bZI/W/QQzJ3PdTn5XJeTz3U5+VyXj890Oflcl9OV/Fy/aZZFs8TqepLrp7b3JDlzgTXrVbUzydcleWzGY9Pd9ye5f5aBR1BVq929sug5mC+f63LyuS4nn+ty8rkuH5/pcvK5LqcRP9dZbgM+kWRfVe2tqmuy8cCkY5vWHEtyx+T1q5N8sLt7sv/Q5GnBe5PsS/KR+YwOAADAs
oldXuPl9VdyV5IMmOJIe7+2RV3ZtktbuPJXlXkl+tqlPZuKJ6aHLsyap6b5KHk5xP8obufvJZ+lkAAABYErPcBpzuPp7k+KZ990y9/uskr7nAsW9N8tbLmHFEV80ty1wSn+ty8rkuJ5
cvK5Lh+f6XLyuS6n4T7X2rhbFwAAAMYxy3dWAQAA4IoSq5eoqg5U1VpVnaqquxc9D5evqg5X1Wer6r8vehbmo6qur6oPVdUnq+pkVb1x0TNx+arqq6rqI1X1h5PP9d8ueibmp6p2VNXHquq3Fj0L81FVn66qT1TVx6tqddHzMB9V9fVV9b6q+uPJf2f/4aJn4vJU1f7J/06f+vP5qvqxRc+VuA34klTVjiT/I8nLsvFreU4keW13P7zQwbgsVfWSJF9I8ivd/R2LnofLV1W7k+zu7o9W1dcmeSjJq/xv9epWVZXkud39har6yiS/l+SN3f3ggkdjDqrqTUlWkjyvu39w0fNw+arq00lWutvv41wiVfXuJP+1u985+U0hX93d/2fRczEfk975n0le1N1/uuh5XFm9NLckOdXdp7v7iSRHkhxc8Excpu7+L9l4ijVLorvPdvdHJ6
5JPJrlusVNxuXrDFyabXzn5429cl0BV7UnyA0neuehZgAurqucleUk2fhNIuvsJobp0XprkkRFCNRGrl+q6JI9Oba/H/wGGoVXVjUluTvIHi52EeZjcKvrxJJ9N8oHu9rkuh59P8q+TfHHRgzBXneQ/VdVDVXXnoodhLl6Q5FySX57ctv/Oqnruoodirg4l+fVFD/EUsXppaot9/lYfBlVVX5Pk/Ul+rLs/v+h5uHzd/WR3vzDJniS3VJVb969yVfWDST7b3Q8tehbm7sXd/d1JXpnkDZOv3XB125nku5P8u+6+Ocn/S+IZLkticlv3bUl+Y9GzPEWsXpr1JNdPbe9JcmZBswDPYPKdxvcn+bXuP
oeZivyW1nH05yYMGjcPlenOS2yfcbjyT5/qr694sdiXno7jOTf342yW9m4+tUXN3Wk6xP3dXyvmzEK8vhlUk+2t3/e9GDPEWsXpoTSfZV1d7J3zwcSnJswTMBm0wexPOuJJ/s7p9d9DzMR1Xtqqqvn7z+O0luTfLHi52Ky9Xdb+7uPd19Yzb+u
B7v7hBY/FZaqq504ecJfJbaIvT+Kp+1e57v5fSR6tqv2TXS9N4uGFy+O1GegW4GTjUj4z6u7zVXVXkgeS7EhyuLtPLngsLlNV/XqSf5Tk2qpaT/KW7n7XYqfiMr04yT9J8onJ9xuT5N909/EFzsTl253k3ZMnFX5Fkvd2t19zAmP6u0l+c+PvDrMzyXu6+z8udiTm5F8k+bXJhZvTSf7pgudhDqoF1kOPAAAATElEQVTqq7PxG0/+2aJnmeZX1wAAADActwEDAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAzn/wPfSjq58IpfOgAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"log_targets = data_rec.map(lambda r: np.log(float(r[-1]))).collect()\n",
"\n",
"hist(log_targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"data_log = data.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"model_log = LinearRegressionWithSGD.train(data_log, iterations=10, step=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we have transformed the target variable, the predictions of the model will be on the log scale,\n",
"as will the target values of the transformed dataset. Therefore, in order to use our model and \n",
"evaluate its performance, we must first transform the log data back into the original scale by \n",
"taking the exponent of both the predicted and true values using the numpy exp function.\n",
"We will show you how to do this in the code here:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_log = data_log.map(lambda p: (np.exp(p.label), np.exp(model_log.predict(p.features))))"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared E
or: 50685.5559\n",
"log - Mean Absolue E
or: 155.2955\n",
"Root Mean Squared Log E
or: 1.5411\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_log.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Non log-transformed predictions:\n",
"[(16.0, 117.89250386724846), (40.0, 116.2249612319211), (32.0, 116.02369145779235)]\n",
"Log-transformed predictions:\n",
"[(15.999999999999998, 28.080291845456212), (40.0, 26.959480191001784), (32.0, 26.65472562945802)]\n"
]
}
],
"source": [
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted.take(3)))\n",
"\n",
"print (\"Log-transformed predictions:\\n\" + str(true_vs_predicted_log.take(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tuning model parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One relatively easy way to do this is by first taking a random sample of, say, 20 percent of our data as our test set. We will then define our training set as the elements of the original RDD that are not in the test set RDD."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Spliting data into training and test data for cross validation"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"train, test = data.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"train_size=train.count()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"test_size=test.count()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training data size: 13834\n"
]
}
],
"source": [
"print (\"Training data size: %d\" % train_size)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test data size: 3545\n"
]
}
],
"source": [
"print (\"Test data size: %d\" % test_size)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train + Test size : 17379\n"
]
}
],
"source": [
"print (\"Train + Test size : %d\" % (train_size + test_size))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can confirm that we now have two distinct datasets that add up to the original dataset in total:\n",
"\n",
"Training data size: 13934\n",
"\n",
"Test data size: 3545\n",
"\n",
"Total data size: 17379\n",
"\n",
"Train + Test size : 17379\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The impact of parameter settings for linear models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have prepared our training and test sets, we are ready to investigate the impact of different parameter settings on model performance. We will first ca
y out this evaluation for the linear model. We will create a convenience function to evaluate the relevant performance metric by training the model on the training set and evaluating it on the test set for different parameter settings.\n",
"\n",
"We will use the RMSLE evaluation metric, as it is the one used in the Kaggle competition with this dataset, and this allows us to compare our model results against the competition leade
oard to see how we perform.\n",
"\n",
"The evaluation function is defined here:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"def squared_log_e
or(pred, actual):\n",
" return (np.log(pred + 1) - np.log(actual + 1))**2"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def evaluate(train, test, iterations, step, regParam, regType, intercept):\n",
"\n",
" model = LinearRegressionWithSGD.train(train, iterations, step, regParam=regParam, regType=regType, intercept=intercept)\n",
"\n",
" tp = test.map(lambda p: (p.label, model.predict(p.features)))\n",
" \n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we saw when evaluating our classification models, we generally expect that a model trained with SGD will achieve better performance as the number of iterations increases, although the increase in performance will slow down as the number of iterations goes above some minimum number. Note that here, we will set the step size to 0.01 to better illustrate the impact at higher iteration numbers:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 5, 10, 20, 50, 100]\n",
"[2.9204455616016656, 2.0695085222669265, 1.79815897170536, 1.594156705081269, 1.43308397524522, 1.3878383528812235]\n"
]
}
],
"source": [
"params = [1, 5, 10, 20, 50, 100]\n",
"\n",
"metrics = [evaluate(train, test, param, 0.01, 0.0, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we will use the matplotlib li
ary to plot a graph of the RMSLE metric against the number of iterations. We will use a log scale for the x axis to make the output easier to visualize:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEPCAYAAAC5sYRSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl4VOXdxvHvb7KybwmQsBiQRRCIQFgUqYhWqYJL3SqKBQVq5VWstrVv32ptaxf3jbqwiQu4VKkKVq0LuLMEJAiCAoqCQQiyL9mf9485YIhZJmGSM5ncn+vKxWTOM+fcSU5uTs7MPMecc4iISHQJ+B1ARETCT+UuIhKFVO4iIlFI5S4iEoVU7iIiUUjlLiIShSotdzNLNLMlZpZlZqvN7E9ljEkws2fNbL2ZLTaztJoIKyIioQnlyD0PGO6cSwdOAEaY2eBSY64CdjrnugD3AreHN6aIiFRFpeXugvZ5n8Z5H6Xf+XQu8Lh3+3ngNDOzsKUUEZEqCemcu5nFmNkKYBvwhnNucakh7YBNAM65QmA30CqcQUVEJHQhlbtzrsg5dwLQHhhoZr1KDSnrKP0H8xqY2UQzy/Q+JlY9roiIhMKqOreMmf0R2O+cu6vEfa8DtzrnPjKzWOBbINlVsPKkpCSXlpZWvdQiIvXUsmXLtjvnkisbF1vZADNLBgqcc7vMrAFwOj98wvRl4OfAR8CFwNsVFTtAWloamZmZlW1eRERKMLOvQhlXabkDKcDjZhZD8DTOc865+Wb2ZyDTOfcyMAN40szWAzuAn1Uzt4iIhEGl5e6cWwn0LeP+W0rczgUuCm80ERGpLr1DVUQkCqncRUSikMpdRCQKqdxFRKJQnSv3/MJi5i7fjK79KiJSvjpX7nOXb+aG57K4+7+f+x1FRCRihfI694hyyYAOZG3ezZQF60mIDXDtaV39jiQiEnHqXLmbGX89rxf5hcXc/cbnxMcG+MUpx/odS0QkotS5cgcIBIw7LuxDflExf391LQmxAcYO6eR3LBGRiFEnyx0gJmDcc3E6+YVF3DrvU+JjYxg9qKPfsUREIkKde0K1pLiYAA9e2o/hx7Xm/178hOeXbfY7kohIRKjT5Q4QHxvgocv6cXKXJH77fBYvZ2X7HUlExHd1vtwBEuNimDomgwFpLfnVsyt4bdUWvyOJiPgqKsodoEF8DDPHDuCEDs259umPeWvNVr8jiYj4JmrKHaBRQiyPjRtAj5Sm/PKp5bz7eY7fkUREfBFV5Q7QNDGOJ64cyLGtGzPxyUw+2vCd35FERGpd1JU7QPOG8Tx11UA6tGjIVY8vZdlXO/yOJCJSqyotdzPrYGYLzGyNma02s8lljGlmZvPMLMsbM65m4oauVeMEZk8YRNumiYyduZSsTbv8jiQiUmtCOXIvBG50zvUABgOTzKxnqTGTgE+dc+nAMOBuM4sPa9JqaN0kkdkTBtG8URxXzFzC6uzdfkcSEakVlZa7c26Lc265d3svsAZoV3oY0MTMDGhM8CLZhWHOWi0pzRowZ/xgGsXHMGbGEj7futfvSCIiNa5K59zNLI3gxbIXl1o0BegBZAOfAJOdc8VhyBcWHVo2ZM6EwcTFGKOnLWZDzj6/I4mI1KiQy93MGgMvANc75/aUWnwmsAJIBU4ApphZ0zLWMdHMMs0sMyendl+mmJbUiNnjBwOO0dMW8dV3+2t1+yIitSmkcjezOILFPts5N7eMIeOAuS5oPfAlcFzpQc65qc65DOdcRnJy8tHkrpYurRsze/xg8guLGT1tMZt3Hqj1DCIitSGUV8sYMANY45y7p5xhXwOneePbAN2BL8IVMpy6t23Ck1cNYm9uAZdNX8y3u3P9jiQiEnahHLkPAcYAw81shfdxlpldbWZXe2P+ApxkZp8AbwE3Oee211Dmo9arXTMev3Ig3+3LZ/T0ReTszfM7kohIWJlfF5rOyMhwmZmZvmz7kKUbd3DFjCV0bNmQpycOpmUj31+9KSJSITNb5pzLqGxcVL5DNVQD0loy4+cZbPxuP2NmLGb3gQK/I4mIhEW9LneAk7ok8eiY/qzbuo8rHlvC3lwVvIjUffW+3AGGdW/NPy
x+pvdjPusaXsz4uI91+JiFSbyt3z455teODSviz/eifjH8/kYH6R35FERKpN5V7CWb1TuPeSE1j05XdMfDKT3AIVvIjUTSr3Us49oR23X9CH99Zt53/mLCe/MGJmURARCZnKvQwXZ3TgtvN68eaabUx+5mMKi1TwIlK3qNzLcfngY7h5ZE9eXfUtN/4ri6Jif94PICJSHbF+B4hkV53cifzCYm5
S3xMQFuv6APgYD5HUtEpFIq90r8ctix5BUWcd+b64iPDXDbeb0ITrcjIhK5VO4hmHxaV/IKi3l44QbiYwPcMrKnCl5EIprKPQRmxm/P7E5eQTEzP/iShNgYbhrRXQUvIhFL5R4iM+PmkT3ILyrikXc2kBAb4Fc/7uZ3LBGRMqncq8DM+PM5vcgrKOb+t4Ln4Ced2sXvWCIiP6Byr6JAwPjHBX0oKCrmztc/IyE2wPihnf2OJSJyBJV7NcQEjLsuSie/qJjbXllDQmyAMSem+R1LROQwlXs1xcYEuP9nfckvXM7NL60mITaGiwd08DuWiAgQ2jVUO5jZAjNbY2arzWxyOeOGeZfgW21m74Q/auSJiwnwz8v6ckq3ZG6au5IXP/7G70giIkBo0w8UAjc653oAg4FJZtaz5AAzaw48BJzjnDseuCjsSSNUQmwMj47pz+BO
jhuRW8snKL35FERCovd+fcFufccu/2XmAN0K7UsNHAXOfc1964beEOGskS42KYMTaD/se0YPIzH/PGp1v9jiQi9VyVJg4zszSgL7C41KJuQAszW2hmy8zsivDEqzsaxscyc+wAjm/XjEmzl7Pws3r1/5uIRJiQy93MGgMvANc75/aUWhwL9AfOBs4EbjazH7zDx8wmmlmmmWXm5OQcRezI1CQxjifGDaRrm8b84sllfLB+u9+RRKSeCqnczSyOYLHPds7NLWPIZuA159x+59x24F0gvfQg59xU51yGcy4jOTn5aHJHrGYN43jyqkGktWrE+MczWfLlDr8jiUg9FMqrZQyYAaxxzt1TzrCXgKFmFmtmDYFBBM/N10stG8Xz1PhBpDRPZNxjS1j+9U6/I4lIPRPKkfsQYAww3Hup4wozO8vMrjazqwGcc2uA14CVwBJgunNuVY2lrgOSmyQwZ/xgkpok8POZS1j1zW6/I4lIPWLO+XOFoYyMDJeZmenLtmvTN7sOcvEjH7E/v5BnJg7muLZN/Y4kInWYmS1zzmVUNk6X2ath7Zo34OkJg0mMjeGyaYtZv22v35FEpB5QudeCjq0aMmfCIMyM0dMWs3H7fr8jiUiUU7nXks7JjZkzYRCFxY7R0xaxaccBvyOJSBRTudeibm2a8NRVg9ifX8To6YvI3nXQ70giEqVU7rWsZ2pTnrhyILv2F3DZ9MVs25PrdyQRiUIqdx+kd2jOrCsHsHVPLqOnL2b7vjy/I4lIlFG5+6T/MS2ZOXYAm3ce4PLpi9l1IN/vSCISRVTuPhrcuRXTrsjgi+37GTNjCXtyC/yOJCJRQuXus6Fdk3nk8n6s/XYPY2cuYV9eod+RRCQKqNwjwPDj2vDgpf3I2rybK2ct5WB+kd+RRKSOU7lHiBG92nLfJSeQuXEHE57IJLdABS8i1adyjyCj0lO588J0PtiwnV8+tYy8QhW8iFSPyj3CXNC/PX89rzcLPsvh2jkfU1BU7HckEamDVO4RaPSgjtw6qif
XQr1z+7gkIVvIhUUazfAaRsY4d0Ir+omL/9Zy0JMQHuuiidQMD8jiUidYTKPYJN/NGx5BUUc/cbnxMfG+Bv5/dWwYtISFTuEe7a07qSV1jMlAXrSYgNcOs5xxO88qGISPlCuYZqBzNbYGZrzGy1mU2uYOwAMysyswvDG7N+u/GMbkwY2onHP/qKv/1nDX5dPUtE6o5QjtwLgRudc8vNrAmwzMzecM59WnKQmcUAtwOv10DOes3M+P1ZPcgvLGbae1+SGBfDjWd09zuWiESwSsvdObcF2OLd3mtma4B2wKelhl4LvAAMCHdICRb8H0cdT35RMQ++vZ74mADXntbV71giEqGqdM7dzNKAvsDiUve3A84HhqNyrzGBgPHX83of8STrL0451u9YIhKBQi53M2tM8Mj8eufcnlKL7wNucs4VVfRkn5lNBCYCdOzYsepphUDAuOPCPuQXFfP3V9eSEBtg7JBOfscSkQgTUrmbWRzBYp/tnJtbxpAM4Bmv2JOAs8ys0Dn3YslBzrmpwFSAjIwMPStYTbExAe695ATyC4u5dd6nxMfGMHqQ
MUke+F8moZA2YAa5xz95Q1xjnXyTmX5pxLA54Hrild7BJecTEBHhzdl1O7J/N/L37C88s2+x1JRCJIKNMPDAHGAMPNbIX3cZaZXW1mV9dwPqlAQmwMD1/enyHHJvHb57N4OSvb70giEiFCebXM+0DI75pxzo09mkBSNYlxMUy9oj9jH1vKr55dQXxMgBG92vodS0R8ponDokDD+Fhmjh1An
NuPbp5by9dqvfkUTEZyr3KNE4IZZZ4wZyXNumXP3Uct5bl+N3JBHxkco9ijRrEMeTVw2kc1IjJjyRyaIvvvM7koj4ROUeZZo3jGf2+EF0aNGQK2ctZdlXO/yOJCI+ULlHoVaNE5g9fhBtmiYyduZSsjbt8juSiNQylXuUat00kTkTBtG8URxXzFzC6uzdfkcSkVqkco9iKc0aMGf8YBrFxzBmxhI+37rX70giUktU7lGuQ8uGzJ4wmNiAMXraYr7I2ed3JBGpBSr3eqBTUiPmTBiEc47R0xbz9XcH/I4kIjVM5V5PdGndhKfGDyK3sIhLpy3im10H/Y4kIjVI5V6P9EhpylNXDWJPbgGjpy3i2925fkcSkRqicq9nerVrxhNXDmT73jxGT19Ezt48vyOJSA1QuddDfTu24LFxA9myK5fLpy9mx/58vyOJSJip3OupgZ1aMuPnGWz8bj9jZixm94ECvyOJSBip3Ouxk7ok8eiY/qzbuo8rHlvC3lwVvEi0ULnXc8O6t2bK6L6s/mY34x5byv68Qr8jiUgYhHKZvQ5mtsDM1pjZajObXMaYy8xspffxoZml10xcqQlnHN+W+3/Wl+Vf72T845nkFhT5HUlEjlIoR+6FwI3OuR7AYGCSmfUsNeZL4BTnXB/gL3gXwZa64+w+Kdx9cTqLvvyOiU8uI69QBS9Sl1Va7s65Lc655d7tvcAaoF2pMR8653Z6ny4C2oc7qNS88/u25x8/7c27n+cwafZy8guL/Y4kItVUpXPuZpYG9AUWVzDsKuDV6kcSP10yoCN/Ofd43lyzjcnPfExhkQpepC4KudzNrDHwAnC9c25POWNOJVjuN5WzfKKZZZpZZk6OLgMXqcacmMYfzu7Bq6u+5cZ/ZVFU7PyOJCJVFBvKIDOLI1jss51zc8sZ0weYDvzEOVfm9d2cc1PxzsdnZGSoMSLY+KGdyS8q5o7XPiM+JsDtF/QhEDC/Y4lIiCotdzMzYAawxjl3TzljOgJzgTHOuc/DG1H8cs2wLuQVFHP/W+uIjw1w23m9CO4OIhLpQjlyHwKMAT4xsxXefb8HOgI45x4BbgFaAQ95v/yFzrmM8MeV2nb96V3JKyzmkXc2EB8b4JaRPVXwInVApeXunHsfqPC32Tk3HhgfrlASOcyMm0Z0J6+wiMc+2EhCbAw3jeiugheJcCGdc5f6zcy4ZWRP8r0j+MS4ANef3s3vWCJSAZW7hMTM+Mu5vcgrLOa+N4Pn4K8Z1sXvWCJSDpW7hCwQMG6/oA/5hd+/imb80M5+xxKRMqjcpUpiAsY9F6dTUFTMba+sISE2wJgT0/yOJSKlaFZIqbLYmAD3/6wvpx3XmptfWs1zSzf5HUlESlG5S7XExwb452X9GNo1iZvmruTFj7/xO5KIlKByl2pLjIth6pgMBndqxQ3PreCVlVv8jiQiHpW7HJUG8TFM/3kG/Tq2YPIzH/PGp1v9jiQiqNwlDBolxPLYuAEc364Zk2YvZ+Fn2/yOJFLvqdwlLJokxvHEuIF0ad2YXzy5jA/X
c7kki9pnKXsGnWMI6nxg/imFYNuerxTJZu3OF3JJF6S+UuYdWyUTyzxw8mpXki4x5bypIvVfAiflC5S9glN0lgzvjBJDdJYPS0RUx/7wuc0/T9IrVJ5S41om2zRF6cNIThx7XmtlfWcPVTy9iTW+B3LJF6Q+UuNaZZgzgeHdOf/zurB2+u2caoB99ndfZuv2OJ1Asqd6lRZsaEH3XmmYmDyS0o4vyHPuTZpV
NI1IDVO5S60YkNaSV64byoC0Ftz0wif8+l8rOZhf5HcskahVabmbWQczW2Bma8xstZlNLmOMmdkDZ
ezFaaWb+aiSt1WVLjBJ64chDXDe/C3I83c/5DH/BFzj6/Y4lEpVCO3AuBG51zPYDBwCQz61lqzE+Art7HRODhsKaUqBETMG44ozuPjR3A1j25nDPlA81JI1IDKi1359wW59xy7/ZeYA3QrtSwc4EnXNAioLmZpYQ9rUSNYd1b88p1Q+napjGT5izn1pdXk19Y7HcskahRpXPuZpYG9AUWl1rUDig5qfdmfvgfgMgRUps34NmJJ3LlkE7M+nAjFz/6Ed/sOuh3LJGoEHK5m1lj4AXgeufcntKLy3jID14OYWYTzSzTzDJzcnKqllSiUnxsgFtG9eShy/qxfts+zn7gPU08JhIGIZW7mcURLPbZzrm5ZQzZDHQo8Xl7ILv0IOfcVOdchnMuIzk5uTp5JUqd1TuFl/9nCG2bJjJu1lLu/u9nFBXr5ZIi1RXKq2UMmAGscc7dU86wl4ErvFfNDAZ2O+f0LJlUSefkxvz7miFc2K89D769njEzFpOzN8/vWCJ1UihH7kOAMcBwM1vhfZxlZleb2dXemP8AXwDrgWnANTUTV6Jdg/gY7rwonTsu6MOyr3Zy9gPvafIxkWowv94pmJGR4TIzM33ZttQNn2bv4ZrZy9i08yC/PbM7E3/UmeAfkiL1l5ktc85lVDZO71CViNUztSkvX3syZ/Rsw99fXcvEJ5ex+6AmHxMJhcpdIlrTxDgeuqwft4zsyYK12xj54Hus+kaTj4lURuUuEc/MuPLkTjz7ixMpLHL89OEPmbNYk4+JVETlLnVG/2Na8Mp1QxnUqSW
cn3PBcFgfyC/2OJRKRVO5Sp7RsFM+scQP51endeHHFN5z3zw9Yv02Tj4mUpnKXOicmYEw+vStPXDmQ7fvyOWfK+7yc9YP3zInUayp3qbOGdk3mletOpkdKU657+mNueWkVeYWaI14EVO5Sx6U0a8AzEwczYWgnnvjoKy5+5CM27TjgdywR36ncpc6Liwnwf2f35JHL+/NFzn5GPvg+b63Z6ncsEV+p3CVqjOjVlvnXnUy75g246vFM7nhtLYVFmiNe6ieVu0SVY1o1Yu41J3HpwA48tHADl01fzLa9uX7HEql1KneJOolxMfz9p324+6J0sjbv4uwH3mfRF9/5HUukVqncJWpd0L89L04aQpOEWEZPW8RDC9dTrDnipZ5QuUtUO65tcPKxn/RO4Y7XPmPCE5nsOpDvdyyRGqdyl6jXOCGWKZf25U/nHM+763I4+4H3Wbl5l9+xRGqUyl3qBTPj5yel8dwvTgTgwoc/4slFX2nyMYlaKnepV/p2bMH8a0/mpC6tuPnFVUx+ZgX78zT5mESfUK6hOtPMtpnZqnKWNzOzeWaWZWarzWxc+GOKhE+LRvHM/PkAfnNmd+avzObcf37Auq17/Y4lElahHLnPAkZUsHwS8KlzLh0YBtxtZvFHH02k5gQCxqRTu/DU+EHsOpDPOVM+4N8f
Y7lkjYVFruzrl3gYquUOyAJha8uGVjb6z+zpU64aRjk3jluqH0bteMXz2bxe
Qm5BZp8TOq+cJxznwL0ALKBT4DJzjm951vqjDZNE5kzYRBXn3IscxZ/zYWPfMhn3+o0jdRt4Sj3M4EVQCpwAjDFzJqWNdDMJppZppll5uTkhGHTIuERGxPgdz85jmlXZLB550F+cv+73PziKnbs12vipW4KR7mPA+a6oPXAl8BxZQ10zk11zmU45zKSk5PDsGmR8PpxzzYsuHEYYwYfw5wlXzPszgXMfP9LCjQBmdQx4Sj3r4HTAMysDdAd+CIM6xXxRYtG8fzp3F68Onko6R2a8+f5nzLivndZ8Nk2v6OJhMwqexOHmT1N8FUwScBW4I9AHIBz7hEzSyX4ipoUwIB/OOeeqmzDGRkZLjMz82iyi9Q45xxvr93Gba+s4cvt+xnWPZk/nN2TLq0b+x1N6ikzW+acy6h0nF/v0FO5S12SX1jM4x9u5IG31nGwoIgxJx7D9ad1o1nDOL+jST0TarnrHaoiIYiPDTDhR51Z8JthXJTRgcc/3Miwuxbw5EcbdUEQiUgqd5EqSGqcwN9/2pv51w6le9sm3PzSas5+4H3eX7fd72giR1C5i1RDz9SmPD1hMI9c3o8DBYVcPmMx4x/PZOP2/X5HEwFU7iLVZmaM6JXCG786hd+O6M5HG7bz43vf4W
WcOe3AK/40k9p3IXOUqJcTFcM6wLC349jPNOaMe0975g+F0LeXrJ1xTpyk/iE5W7SJi0bprInRel8/Kkk0lr1Yj/nfsJox7U9VvFHyp3kTDr3b4Z/7r6RB68tC+7Dxbws6mLuGb2MjbtOOB3NKlHVO4iNcDMGJWeyls3nsINP+7GgrU5nHbPO9zx2lr26eIgUgtU7iI1KDEuhutO68
vz6Fs3un8NDCDQy/ayHPL9tMsc7HSw1SuYvUgpRmDbj3khOYe81JpDZvwK
lcX5D33Asq8qulSCSPWp3EVqUb+OLZj7y5O495J0vt2TywUPf8R1T39M9q6DfkeTKKNyF6llgYBxft/2LPj1MK4b3oXXV3/L8LsXcu8bn3MwX1eBkvBQuYv4pGF8LDec0Z23bjyF03u04f631jH87oW8tOI
JrQT6KHyl3EZ+1bNGTK6H4894sTadU4nsnPrOCChz8ka9Muv6NJHaZyF4kQAzu15OVJJ3PHBX34esdBzv3nB9zw3Aq27sn1O5rUQSp3kQgSCBgXD+jAwt8M45fDjmV+1hZOvWshU95eR26BzsdL6FTuIhGocUIsN404jjdvOIWhXZO467+fc9rd7/DKyi06Hy8hqbTczWymmW0zs1UVjBlmZivMbLWZvRPeiCL1V8dWDXl0TAZzJgyiSWIsk+Ys55Kpi1j1zW6/o0mEC+XIfRYworyFZtYceAg4xzl3PHBReKKJyCEnHZvEK9cN5a/n92L9tn2MmvI+Nz2/kpy9eX5HkwhVabk7594FKnob3WhgrnPua2+8LhEvUgNiAsZlg45hwa+HcdWQTrywfDOn3rWQR9/ZQF6hzsfLkcJxzr0b0MLMFprZMjO7IgzrFJFyNGsQxx9G9uS/v/oRgzq15O+vruWMe9/ltVVbNH+8HBYbpnX0B04DGgAfmdki59znpQea2URgIkDHjh3DsGmR+qtzcmNmjB3Au5/n8Jf5n3L1U8tJapzAyD4pjOyTQr+OLQgEzO+Y4hML5Zl3M0sD5jvnepWx7HdAonPuVu/zGcBrzrl/VbTOjIwMl5mZWY3IIlJaYVEx
10K/Oysnl77TbyCotJbZbIyPRURvVJpVe7ppip6KOBmS1zzmVUNi4cR+4vAVPMLBaIBwYB94ZhvSISotiYAGf1TuGs3insyyvkTa/oH/vgS6a++wVprRoyKj2VUempdGvTxO+4UgsqPXI3s6eBYUASsBX4IxAH4Jx7xBvzG2AcUAxMd87dV9mGdeQuUvN2Hcjn9dXfMi9rCx9u2E6xg25tGjOqTyoj01PplNTI74hSRaEeuYd0WqYmqNxFalfO3jxeW7WFeVlbWLIx+AK43u2aBc/Rp6fSrnkDnxNKKFTuIlKu7F0H+c8nW5i3csvhCcr6H9OCUX1SOKtPCq2bJPqcUMqjcheRkHz93QHmrcxmXlY2a7/dS8BgcOdWjEpPZcTxbWnRKN7viFKCyl1Eqmzd1r3MW7mF+VnZfLF9P7EB4+SuSYzqk8qPj29D08Q4vyPWeyp3Eak25xyfbtnDvKwtzMvK5ptdB4mPDTCsWzKj0lM5rUdrGsaH48V2UlUqdxEJC+ccH2/axbysbF5ZuYVte/NoEBfD6T3bMKpPCqd0TyYhNsbvmPWGyl1Ewq6o2LF04w7mZWXz6qpv2bE/nyaJsZx5fFtG9klhSJck4mI0k3hNUrmLSI0qKCrmww3fMT8rm9dWf8ve3EJaNIzjJ71TGNUnlYGdWhKj6Q/CTuUuIrUmr7CIdz/fzrysbN5cs5UD+UW0bpLAWb1TGJWeSr+OzTX9QZio3EXEFwfzi3h77
gPDefbSO/sJh2zRswMj14RH98qua5ORoqdxHx3d7cAt7w5rl5b912CosdnZMaMbJP8Ii+q+a5qTKVu4hElJ37vXluVmbz0YbvKHZwXNsmjEpPZWSfFI5ppXluQqFyF5GItW1vLq9+8i3zsrLJ/GonAOntmzGyTypn90khVfPclEvlLiJ1Qvaug7yycgvzVmazcnPwwt8D0lowKj2Vn/RKIblJgs8JI4vKXUTqnI3b9zN/ZTbzs
w2dbgPDcnHtuKUX1SGdGrLc0bap4blbuI1Gmfb93L/Kxs5q3cwpfePDc/6pbMqPQUTu/Rhib1dJ4blbuIRAXnHKuz9zAvK5v5K7fwza6DJMQGOLV7a4Z0TaJDiwa0b9GQ9i0akBgX/dMgqNxFJOoUF5eY5+aTLeTszTtieXKTBDq0aECHlsGy79Ci4eHbqc0bRMXUCGErdzObCYwEtpV1gewS4wYAi4BLnHPPV7ZhlbuIHI3iYkfOvjw27TjA5p0H2bTjAJt2erd3HiB7Vy5Fxd/3W8CgbdNE2rdsSAfvSP/wfwItG9K2aWKdmC4hnBfIngVMAZ6oYGMxwO3A66EGFBE5GoGA0aZpIm2aJpKR9sPlhUXFfLsnl007Dh4u/c3efwQfbtjOt3tyKXlsGxtRAfI+AAAK60lEQVQwUps3oEPLBrRv3pAOLY/8CyC5SUKdemdtpeXunHvXzNIqGXYt8AIwIAyZRESOWmxMwDsX35ATafWD5XmFRWzZlcumnQfYtOMgm3ceYNPO4L9vrd3G9n1HnvJJiA3Qziv6Q0f7JW+3aBgXUeV/1LPtm1k74HxgOCp3EakjEmJjSEtqRFpS2e+MPZhfxDe7Dhxx5H/oFFDW5l3sOlBwxPhG8TG0bxE84m9fxmmf2r6KVTgupXIfcJNzrqiy/7XMbCIwEaBjx45h2LSISM1oEB9Dl9ZN6NK67Plv9uYWlDjX7x35e38BfLThO
nFx0xvmli7OGj
O9uXVqUjjKPQN4xiv2JOAsMyt0zr1YeqBzbiowFYJPqIZh2yIivmiSGEePlDh6pDT9wTLnHLsOFBx+cvfwk747D7A+Zx/Zuw7WeL6jLnfnXKdDt81sFjC
GIXEakvzIwWjeJp0Sie3u2b+ZKh0nI3s6eBYUCSmW0G/gjEATjnHqnRdCIiUi2hvFrm0lBX5pwbe1RpREQkLOr+27VEROQHVO4iIlFI5S4iEoVU7iIiUUjlLiIShVTuIiJRyLf53M0sB9gF7C5nSLMKliUB22siVw2r6GuK5G0dzbqq+thQx4cy
Ix0baPaf8K3/hI3r+Occ4lVzrKOefbBzC1mssy/cxdE19vJG
aNZV1ceGOj6UcZWNibZ9TPtX+MZHw/7l92mZedVcVlfV5tcUzm0dzbqq+thQx4cy
Ix0baPaf8K3/g6v3/5dlrmaJhZpgvhSiQi1aV9TGpSbexffh+5V9dUvwNI1NM+JjWpxvevOnnkLiIiFaurR+4iIlIBlbuISBRSuYuIRKGoKHcza2Rmj5vZNDO7zO88El3MrLOZzTCz5/3OItHJzM7z+uslMzsjHOuM2HI3s5lmts3MVpW6f4SZfWZm683sd97dPwWed85NAM6p9bBS51Rl/3LOfeGcu8qfpFJXVXEfe9H
7HAJeHYfsSWOzALGFHyDjOLAf4J/AToCVxqZj2B9sAmb9iRlxwXKdssQt+/RKpjFlXfx/7gLT9qEVvuzrl3gR2l7h4IrPeOpPKBZ4Bzgc0ECx4i+GuSyFHF/Uukyqqyj1nQ7cCrzrnl4dh+XSvCdnx/hA7BUm8HzAUuMLOHib63lEvtKXP/MrNWZvYI0NfM/tefaBIlyuuwa4HTgQvN7OpwbKjSC2RHGCvjPuec2w+Mq+0wEnXK27++A8LyCyf1Xnn72APAA+HcUF07ct8MdCjxeXsg26csEn20f0lNq7V9rK6V+1Kgq5l1MrN44GfAyz5nkuih/UtqWq3tYxFb7mb2NPAR0N3MNpvZVc65QuB/gNeBNcBzzrnVfuaUukn7l9Q0v/cxTRwmIhKFIvbIXUREqk/lLiIShVTuIiJRSOUuIhKFVO4iIlFI5S4iEoVU7hHEzJyZPVni81gzyzGz+ZU87gQzO6uC5RlmdlRvbTazZDN
GYfm9nQo1lXuJnZn83sdL9zVMTMZpnZhbWwnYvMbI2ZLSh1f+qh+egr21+qsc3mZnZNWdsS/6jcI8t+oJeZNfA+/zHwTQiPOwEo85fVzGKdc5nOueuOMttpwFrnXF/n3HuhPMCb3jQszKzceZCcc7c4594M17YiTRW/j1cB1zjnTi15p3Mu2zl36D+XcveXCjJUNA9Vc+BwuZfalvjFOaePCPkA9gF/Ay70Pn8CuAmY733eCJhJ8C3MHxOcjjYe+BrIAVYQnOj/VmAq8F9gDjCsxDoaA48BnwArgQuAGIJzT6/y7v9VqVwnlNpGA+BSb+wq4PZSX8OfgcXAySXu7wEsKfF5GrDSu32L9zWt8nIfenPdQu/78Q7wR+BLIM5b1hTYCMR52Q99zzYCfwKWe/mO8+5PBt7w7n8U+ApIKudn8FcgC1gEtPHuP7yNQ+O8f4d5+Z4DPgf+AVwGLPG2f2yJxz8CvOeNG+ndHwPc6X39K4FflFjvAu/n92kZOX/w/fe+j/uAz4A7S41P88aWt
8YL/yHjMW+BfBmVbfJrjvvFXie3to3DPAQW99dx7alrcske/3t4+BU0usey7wGrAOuKPE92MW5eyL+qhCn/gdQB8lfhjBX8w+wPPeL8UKjizmvwGXe7ebeyXRyPtFmVJiPbcCy4AG3ucl13E7cF+JsS2A/sAbJe5rXka2w9sAUr2CSCY4s+jbwHneMgdcXM7XtwLo7N2+CfiDd7tliTFPAqO82wuBh0ose6zEdiYCd3u3Z3FkuV
3b4GmO7dngL8r3d7hJezrHJ3JbZ/R4mMh7dx6GdV4nu7C0gBEgj+pfUnb9nkQ99r7/GvEfxruSvBCaQSva/j0DYSgEygk7fe/UCnMjJW9P1fCGSU8Zg0vi/cwz/LEParzYd+Pt62mnq3k4D1BGc5PLzuMrZ1I/CYd/s4L3eit+4vgGbe518RnFCr0n1RH6F96LRMhHHOrST4y3Ep8J9Si88AfmdmKwj+EicCHctZ1cvOuYNl3H86Ja704pzbSfCXrLOZPWhmI4A9lcQcACx0zuW44FwZs4EfecuKgBfKedxzwMXe7UuAZ73bp3rn8z8BhgPHl3jMsyVuT+f7qZ3HESz7ssz1/l1G8HsJcDLBI0ycc68BO8t5bD5w6DmOko+vyFLn3BbnXB6wgeBfTBA88iz5+Oecc8XOuXUEv+fHEfyZXuH9TBcDrQiWPwT/0vmyjO1V9P2vjor2qzecc4cuOGHA38xsJfAmwXnI21Sy7pMJ/oeNc24twRLv5i17yzm32zmXC3wKHEPV90UpR12bz72+eBm4i+DRW6sS9xtwgXPus5KDzWxQGevYX866jeDR6WHOuZ1mlg6cCUwiWMBXVpCvrDmpD8l1zpV3qcNngX+Z2dzgZt06M0sEHiJ4tLnJzG4lWC4/+Dqccx+YWZqZnQLEOOeOuDZlCXnev0V8v49XlLmkAucdMpZ6fCHec1RmZgRPb5TeHkBxic+LOfJ3rPRETs7Lda1z7vWSC8xsGBX/DMOpov2qZIbLCP610N85V2BmGznyZ1XeustT8vtWBMRWY1+UcujIPTLNBP7snPuk1P2vA9d65YKZ9fXu3ws0CXHd/yU4Kx3eOlqYWRIQcM69ANwM9KtkHYuBU8wsyXuy71KC550r5JzbQPCX+Ga+PyI/VA7bzawxUNkTcU8AT1P+UXt53sf7q8G7unyLKj5+I8FTBhB8riOuio8HuMjMAmZ2LNCZ4Lnx14Ffmlmcl62bmTWqZD3V+v6XUHp/KW+/Kq0ZsM0r9lMJHmmXtb6S3iX4nwJm1o3gXwSflTOWauyLUg6VewRyzm12zt1fxqK/ECyVld4V1f/i3b8A6GlmK8yssiun3wa0MLNVZpYFnErwz+uF3p/ls4AKLyXnnNvijVlA8InH5c65l0L76ngWuJzgKRqcc7uAaQRPYbxI8Em9iswmWMxPh7i9Q/4EnGFmywlenHgLwVIK1TSChboEKH1EG6rPCJbwq8DV3umI6QRPSSz3fqaPUslf1Ef5/Ycf7i/l7VelzQYyzCyTYGGv9fJ8B3zg7VN3lnrMQ0CMd8rtWWCsd/qqPFXaF6V8mvJX6hTvteLnOufGVPFxCUCRc67QzE4EHnbOnVAjIUUigM65S51hZg8SPOquzhtwOgLPmVmA4JOmE8KZTSTS6MhdRCQK6Zy7iEgUUrmLiEQhlbuISBRSuYuIRCGVu4hIFFK5i4hEof8HpfozdFs5+LIAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will perform a similar analysis for step size in the following code:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"params = [0.01, 0.025, 0.05, 0.1, 1.0]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
}
],
"source": [
"metrics = [evaluate(train, test, 10, param, 0.0, 'l2', False) for param in params]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.01, 0.025, 0.05, 0.1, 1.0]\n",
"[1.79815897170536, 1.432660677663247, 1.3921046531899715, 1.463373357714063, nan]\n"
]
}
],
"source": [
"print (params)\n",
"print (metrics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"why we avoided using the default step size when training the linear model originally. The default is set to 1.0, which, in this case, results in a nan output for the RMSLE metric. This typically means that the SGD model has converged to a very poor local minimum in the e
or function that it is optimizing. This can happen when the step size is relatively large, as it is easier for the optimization algorithm to overshoot good solutions.\n",
"\n",
"We can also see that for low step sizes and a relatively low number of iterations (we used 10 here), the model performance is slightly poorer. However, in the preceding Iterations section, we saw that for the lower step-size setting, a higher number of iterations will generally converge to a better solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Selecting the best parameter settings can be an intensive process that involves training a model on many combinations of parameter settings and selecting the best outcome. Each instance of model training involves a number of iterations, so this process can be very expensive and time consuming when performed on very large datasets.\n",
"\n",
"The output is plotted here, again using a log scale for the step-size axis:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEOCAYAAACO+Hw9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8FfW9
HXJwtbgARIWBM2ZV8SJcUqLriCKy4Etbe3m79Se7Xe1t3WpS7VumA322u9vd
5WoJKu6gdcO6J5iEXRCVhDXsOyHh8/vjDHiMCQnkJJPkvJ+PRx6czHzPzOecQ95n5jsz3zF3R0RE4kdC2AWIiEjTUvCLiMQZBb+ISJxR8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMQZBb+ISJxR8IuIxJmksAuoSXp6uvfv3z/sMkREWozCwsL17p5Rn7bNMvj79+9PQUFB2GWIiLQYZvZ5fduqq0dEJM4o+EVE4oyCX0QkztQr+M3sUTNbZ2bza5mfambPmVmxmS0ws+9Gzfu2mS0Nfr4dq8JFROTw1HeL/zFg4kHmXwEsdPdsYDwwzczamFlX4DbgGGAscJuZdTn8ckVEpKHqFfzuPgfYeLAmQCczM6Bj0LYSmAC84u4b3X0T8AoH/wIREZFGFqs+/oeAYcAqYB7wn+6+D+gDlEa1KwumNYq3lpZTtmlnYy1eRKRViFXwTwCKgN5ADvCQmXUGrIa2Nd7k18ymmlmBmRWUl5cfcgGbd1Zw+V8LuX5GCfv26T7CIiK1iVXwfxd4yiOWAZ8CQ4ls4WdFtcskslfwFe7+iLvnuntuRka9Lj77krQObfjZ2cN555MN/O39el/HICISd2IV/CuAUwHMrAcwBFgOzAbOMLMuwUHdM4JpjeLSsVmcODiDe15czKfrdzTWakREWrT6ns75OPAuMMTMyszsMjO73MwuD5rcCRxnZvOAV4Eb3H29u28M5n0Y/NwRTGsUZsZ9F40mOdG4Nr+YKnX5iIh8hbk3v3DMzc31hozVM/Ojlfz4H0XceOZQLj/piBhWJiLSPJlZobvn1qdtq7xyd1JObyaO6MmDL3/MkjXbwi5HRKRZaZXBb2bcdcFIOrVL4urpReyt2hd2SSIizUarDH6A9I5t+cUFo1iwaiu/e21Z2OWIiDQ
Tb4ASaO7MmFR/Xh968vo6Rsc9jliIg0C606+AFuO3cEGR3bcvX0YnbvrQq7HBGR0LX64E/tkMy9k0ezbN12pr28JOxyRERC1+qDH+CkwRn82zF9+dO/PuWDTxvtMgIRkRYhLoIf4KdnDSOzS3uuzS9mx57KsMsREQlN3AR/StskHpicTemmndzz0qKwyxERCU3cBD/AMQO7cdm4AfztvRXM+fjQRwAVEWkN4ir4Aa6dMIQju3fk+hklbNm1N+xyRESaXNwFf7vkRKblZVO+fQ+3P7cg7HJERJpc3AU/QHZWGleMP4Kn5q5k9oI1YZcjItKk4jL4Aa48ZRAjenfmZ0/PY8P2PWGXIyLSZOI2+NskJTBtSjZbd1Xys6fn0xyHpxYRaQx1Br+ZPWpm68xsfi3zrzOzouBnvplVmVnXYN5nZjYvmHf4A+w3kqE9O/OT0wcza8Eani2u8Y6QIiKtTn22+B8DJtY2093vd/ccd88BbgLerHaXrZOD+fW6QUBTm3riQI7qm8YtM+ezduvusMsREWl0dQa/u88B6jvOwaXA4w2qqIklJhgPTsmhomof188oUZePiLR6MevjN7MORPYMnoya7MDLZlZoZlNjta5YG5Cewk1nDuPNj8t54sPSsMsREWlUsTy4ey7wdrVunnHufjRwJnCFmZ1Y25PNbKqZFZhZQXl5019V++9f78dxR3TjrucXUrpxZ5OvX0SkqcQy+C+hWjePu68K/l0HPA2Mre3J7v6Iu+e6e25GRkYMy6qfhATjvsmjMTOuzS9m3z51+YhI6xST4DezVOAk4JmoaSlm1mn/Y+AMoMYzg5qLzC4duPWc4bz/6Ub+/M5nYZcjItIokupqYGaPA+OBdDMrA24DkgHc/eGg2QXAy+6+I+qpPYCnzWz/ev7P3WfFrvTGkZebyewFa7hv1mJOGpzBkd07hl2SiEhMWXM8iyU3N9cLCsI77X/d1t2c8es59OuWwpOXH0tSYtxe5yYiLYSZFdb3tHklWg26d27HnZNGUly6mYff/CTsckREYkrBX4tzs3tzzuhe/ObVpSxYtSXsckREYkbBfxB3ThpJavs2XDO9mD2VVWGXIyISEwr+g+iS0oZfXjiKxWu28dtXl4ZdjohITCj463Da8B7kjcnkv974hLkrNoVdjohIgyn46+HWc4fTK7U9104vZleFunxEpGVT8NdDp3bJ3Dd5NMvX7+C+2YvDLkdEpEEU/PU07sh0vn1sP/789me888n6sMsRETlsCv5DcMOZQ+nfrQPX5ZewbffesMsRETksCv5D0KFNEtOmZLN6yy5+8cKisMsRETksCv5DNKZfV6aeeARPfFjK64vXhV2OiMghU/Afhp+cPojBPTpyw5MlbN5ZEXY5IiKHRMF/GNomJfLglBw27qjg1mcWhF2OiMghUfAfppF9Urnq1EE8W7yKF0pWh12OiEi9Kfgb4Ifjj2B0Zio3z5xH+bY9YZcjIlIvdQa/mT1qZuvMrMa7Z5nZdWZWFPzMN7MqM+sazJtoZkvMbJmZ3Rjr4sOWnJjAtLxsdlRUcdNT82iO9zYQEamuPlv8jwETa5vp7ve7e4675wA3AW+6+0YzSwR+T+RG68OBS81seAxqblYG9ejE9ROG8M9Fa3ly7sqwyxERqVOdwe/uc4CN9VzepXxxw/WxwDJ3X+7uFcATwKTDqrKZ++64AYzt35Xbn13Aqs27wi5HROSgYtbHb2YdiOwZPBlM6gOURjUpC6a1OokJxv15o6ly5/oZJeryEZFmLZYHd88F3nb3/XsHVkObWhPRzKaaWYGZFZSXl8ewrKbRr1sKPz1rGP9atp6/vb8i7HJERGoVy+C/hC+6eSCyhZ8V9XsmsKq2J7v7I+6e6+65GRkZMSyr6fzbMX05YVA6d7+wiM/W7wi7HBGRGsUk+M0sFTgJeCZq8ofAIDMbYGZtiHwxPBuL9TVXZsZ9k0eTlGhcm19M1T51+YhI81Of0zkfB94FhphZmZldZmaXm9nlUc0uAF529wObue5eCVwJzAYWAdPdvdVf5tortT23nzeCgs838T
Wh52OSIiX2HN8UBkbm6uFxQUhF3GYXN3fvDXQt5YUs7zVx3P4B6dwi5JRFo5Myt099z6tNWVu43AzLj7wlF0bJfE1dOL2Fu1L+ySREQOUPA3kvSO
n7gpHMX7mV37++LOxyREQOUPA3ookje3F+Tm8eem0Z88q2hF2OiAig4G90t583km4d23D19CJ2760KuxwREQV/Y0vtkMy9F41m6
t/OqVj8MuR0REwd8Uxg/pzqVj+/LIW8sp+Ky+wx6JiDQOBX8T+dnZw+iT1p5r8ovZWVEZdjkiEscU/E2kY9skHsjLZsXGndzz4uKwyxGROKbgb0JfH9iN740bwF/f+5y3lra8gehEpHVQ8Dex6yYMYWBGCtfPKGHr7r1hlyMicUjB38TaJSfy4JQc1m3bwx3PLQy7HBGJQwr+EORkpfEf449gRmEZryxcG3Y5IhJnFPwh+dEpgxjWqzM3PVXCxh0VYZcjInFEwR+SNkkJPDglmy279nLzzHm6XaOINBkFf4iG9erMj08bzIvz1vBcyeqwyxGROKHgD9kPThzIUX3TuGXmfNZu3R12OSISB+pzB65HzWydmc0/SJvxZlZkZgvM7M2o6Z+Z2bxgXsu9s0ojSkpMYFpeNnsqq7jxyRJ1+YhIo6vPFv9jwMTaZppZGvAH4Dx3HwHkVWtysrvn1PfOMPFoYEZHbpg4lNeXlDO9oDTsckSklasz+N19DnCwkcW+ATzl7iuC9utiVFtc+fax/Tl2YDfueG4hpRt3hl2OiLRisejjHwx0MbM3zKzQzL4VNc+Bl4PpU2OwrlYrIcG4
JozIzrZhSzb5+6fESkccQi+JOAMcDZwATgFjMbHMwb5+5HA2cCV5jZibUtxMymmlmBmRWUl8fnODZZXTtwyznDeG/5Rv733c/CLkdEWqlYBH8ZMMvdd7j7emAOkA3g7quCf9cBTwNja1uIuz/i7rnunpuRkRGDslqmKblZnDwkg1++tJhPyreHXY6ItEKxCP5ngBPMLMnMOgDHAIvMLMXMOgGYWQpwBlDrmUESYWb88qLRtEtO5JrpxVRW7Qu7JBFpZepzOufjwLvAEDMrM7PLzOxyM7scwN0XAbOAEuAD4E/uPh/oAfzLzIqD6S+4+6zGeiGtSY/O7bjz/JEUlW7mj3OWh12OiLQySXU1cPdL69HmfuD+atOWE3T5yKE7d3QvZs9fw6
+TGnDO3OsF6dwy5JRFoJXbnbTJkZd54/ktT2yVw9vZiKSnX5iEhsKPibsa4p
jnwtEsWr2V3766NOxyRKSVUPA3c6cP78HkMZn84Y1lfLRiU9jliEgroOBvAW49dzg9O7fjmvxidu+tCrscEWnhFPwtQOd2ydw3OZvl5Tu4f/aSsMsRkRZOwd9CHD8onW8d249H3/6U95ZvCLscEWnBFPwtyI1nDqVv1w5cm1/M9j2VYZcjIi2Ugr8F6dAmiWl52azcvItfvLAo7HJEpIVS8Lcwuf27MvWEgTz+wQreWKIRsEXk0Cn4W6CfnD6YwT06csOTJWzZuTfsckSkhVHwt0DtkhOZlpfDhu0V3Pasxr0TkUOj4G+hRmWmcuUpRzKzaBUvzVsddjki0oIo+FuwK04+klF9UvnZzPms374n7HJEpIVQ8LdgyYkJTJuSzfY9lfz0qXm463aNIlI3BX8LN7hHJ649YzAvL1zL0x+tDLscEWkB6nMjlkfNbJ2Z1XoU0czGm1mRmS0wszejpk80syVmtszMboxV0fJllx0/kNx+Xbjt2QWs3rIr7HJEpJmrzx
Y8DE2maaWRrwB+A8dx8B5AXTE4HfE7nR+nDgUjMb3tCC5asSE4wH8rKprHKun1GiLh8ROag6g9/d5wAbD9LkG8BT7r4iaL
qqKxwDJ3X+7uFcATwKQG1iu16J+ewk/PHsZbS9fz9/dXhF2OiDRjsejjHwx0MbM3zKzQzL4VTO8DlEa1KwumSSP55jF9OWFQOne/uIjPN+wIuxwRaaZiEfxJwBjgbGACcIuZDQashra19kGY2VQzKzCzgvLy8hiUFX/MjHsvGk1ignFdfglV+9TlIyJfFYvgLwNmufsOd18PzCFyk/UyICuqXSawqraFuPsj7p7r7rkZGRkxKCs+9U5rz8/PHcEHn23k0X99GnY5ItIMxSL4nwFOMLMkM+sAHAMsAj4EBpnZADNrA1wCPBuD9UkdLjy6D6cP78H9Ly9h6dptYZcjIs1MfU7nfBx4FxhiZmVmdpmZXW5mlwO4+yJgFlACfAD8yd3nu3slcCUwm8gXwXR3X9BYL0S+YGbcfcEoUtokck1+MXur9oVdkog0I9YcT/3Lzc31goKCsMto8V6ct5
+Ptcrj59MFedOijsckSkEZlZobvn1qetrtxtxc4a1YtJOb357atLmb9yS9jliEgzoeBv5W4
wRdU9pw9fQi9lRWhV2OiDQDCv5WLq1DG+69aDQfr93Or15ZGnY5ItIMKPjjwMlDu3PJ17J4ZM4nFH5+sIuwRSQeKPjjxM3nDKd3WnuumV7MzorKsMsRkRAp+ONEx7ZJ3D85m8827OTelxaHXY6IhEjBH0eOPaIb3x3Xn/9993PeXrY+7HJEJCQK/jhz/YShDExP4foZJWzdvTfsckQkBAr+ONO+TSLTpmSzessu7nxuYdjliEgIFPxx6Ki+Xfjh+CPILyzjnwvXhl2OiDQxBX+cuurUQQzt2Ykbn5rHph0VYZcjIk1IwR+n2iYl8uCUHLbsquCWZ2q9nbKItEIK/jg2vHdnfnzaYJ4vWc1zxbXeKkFEWhkFf5z7wYkDyc5K45Zn5rNu6+6wyxGRJqDgj3NJiQlMy8tmV0UVNz01j+Y4TLeIxFZ9bsTyqJmtM7MaO4LNbLyZbTGzouDn1qh5n5nZvGC6Bthvpo7s3pEbJg7l1cXryC8oC7scEWlk9dnifwyYWEebt9w9J/i5o9q8k4Pp9bpBgITjO8f155gBXbnj+YWUbdoZdjki0ojqDH53nwNoSMdWLiHBeCAvG3fn+hkl7NunLh+R1ipWffzHmlmxmb1kZiOipjvwspkVmtnUGK1LGklW1w7cfM5w3vlkA3997/OwyxGRRhKL4J8L9HP3bOB3wMyoeePc/WjgTOAKMzuxtoWY2VQzKzCzgvLy8hiUJYfjkq9lMX5IBve8tIjl5dvDLkdEGkGDg9/dt7r79uDxi0CymaUHv68K/l0HPA2MPchyHnH3XHfPzcjIaGhZcpjMjHsvGk3bpESuyS+mSl0+Iq1Og4PfzHqamQWPxwbL3GBmKWbWKZieApwB6BLRFqBH53bcMWkEH63YzCNzloddjojEWFJdDczscWA8kG5mZcBtQDKAuz8MTAZ+aGaVwC7gEnd3M+sBPB18JyQB/+fusxrlVUjMnZfdm1nz1/CrVz7m5KEZDO3ZOeySRCRGrDlesJObm+sFBTrtP2wbtu9hwq/n0L1TO2ZeMY42S
eT6S5MrPC+p42r79kqVW3jm35xQWjWLh6Kw+9tjTsckQkRhT8clATRvTkwqP78Ps3PqG4dHPY5YhIDCj4pU63nTuC7p3ack1+Mbv3VoVdjog0kIJf6pTaPpl7LxrNsnXbeWD2krDLEZEGUvBLvZw4OINvfr0v
P2p7y/fEPY5YhIAyj4pd5uOnMYWV06cO2MYnbsqQy7HBE5TAp+qbeUtkk8kJdN2aZd3P3iorDLEZHDpOCXQzJ2QFe+f8JA/v7+Ct78WGMqibRECn45ZFefPjhy85YZJWzZuTfsckTkECn45ZC1S07kwSnZlG/fw+3PLQi7HBE5RAp+OSyjM9O48uQjeeqjlcyavybsckTkECj45bBdecqRjOjdmZ89PY/12/eEXY6I1JOCXw5bcmICD07JYdvuSm5+ej7NccA/EfkqBb80yJCenbj6jMHMWrCGZ4pWhV2OiNSDgl8a7PsnDGRMvy7c+sx81mzZHXY5IlIHBb80WGKCMS0vm71VzvVPlqjLR6SZqzP4zexRM1tnZjXeNtHMxpvZFjMrCn5ujZo30cyWmNkyM7sxloVL89I/PYWbzhrKnI/LefyD0rDLEZGDqM8W/2PAxDravOXuOcHPHQBmlgj8HjgTGA5cambDG1KsNG/fPKYf447sxl0vLGTFhp1hlyMitagz+N19DrDxMJY9Fljm7svdvQJ4Aph0GMuRFiIhwbhvcjaJZlw7o5h9+9TlI9IcxaqP/1gzKzazl8xsRDCtDxC9z18WTJNWrE9ae249dzgffLqRR9/+NOxyRKQGsQj+uUA/d88GfgfMDKZbDW1r3QQ0s6lmVmBmBeXlGvyrJZs8JpPThnXnvtlLWLZuW9jliEg1DQ5+d9/q7tuDxy8CyWaWTmQLPyuqaSZQ64ne7v6Iu+e6e25GRkZDy5IQmRl3XziKlDaJXDO9mMqqfWGXJCJRGhz8ZtbTzCx4PDZY5gbgQ2CQmQ0wszbAJcCzDV2ftAzdO7XjrvNHUVy2hf9645OwyxGRKEl1NTCzx4HxQLqZlQG3AckA7v4wMBn4oZlVAruASzxyInelmV0JzAYSgUfdXUM5xpGzR/di1oLe/ObVpZwyrDsjeqeGXZKIANYcL7bJzc31goKCsMuQGNi0o4Izfj2HbilteObKcbRNSgy7JJFWycwK3T23Pm115a40qi4p
j3olEsXrON3/xzadjliAgKfmkCpwztwcW5WTz85icUfr4p7HJE4p6CX5rEzecMo1dqe67NL2ZXRVXY5YjENQW/NIlO7ZK5P280n67fwb2zFoddjkhcU/BLkznuiHS+c1x/HnvnM975ZH3Y5YjELQW/NKkbJg5lQHoK1+WXsG333rDLEYlLCn5pUu3bJPJAXjart+zirucXhV2OSFxS8EuTG9OvCz846Qj+UVDKa4vXhl2OSNxR8EsofnzaIIb27MQNT85j046KsMsRiSsKfglF26REpk3JZtOOCm59ViN5iDQlBb+EZkTvVP7z1EE8V7yK50tqHbhVRGJMwS+h+uH4I8jOTOWWmfNZt2132OWIhOaz9TuYNX9Nk6xLwS+hSkpMYNqUHHZWVPHTp+bRHAcNFGksO/ZUMr2glCkPv8v4B97guhnFVFQ2/v0r6hyWWaSxHdm9I9dNGMJdLyxiRmEZeblZdT9JpIVydz74dCP5hWW8OG81OyuqIte2TBjCRUdn0iap8bfHFfzSLHxv3ABeXriWO55byHFHptMnrX3YJYnE1MrNu3iqsIwZc8v4fMNOUtokcu7o3uTlZjKmXxeC+1k1ifrciOVR4BxgnbuPPEi7rwHvARe7+4xgWhUwL2iywt3Pa3jJ0holJBgPTM5m4m/mcMOMEv7yvbEkJDTdH4JIY9i9t4rZC9aQX1DG25+sxx2+PrArV50yiDNH9aRDm3C2veuz1seAh4C/1NbAzBKBe4ncbSvaLnfPOezqJK707daBm88ezk+fnsff3v+cbx3bP+ySRA6Zu1NUupn8wjKeK17Ftt2V9Elrz49OGcTkozPp261D2CXWHfzuPsfM+tfR7EfAk8DXYlCTxLFLx2Yxa8Ea7nlxMScMymBAekrYJYnUy7ptu3l67kpmFJaxdN122iYlcObInuTlZnHswG7Nag+2wfsZZtYHuAA4ha8GfzszKwAqgV+6+8yGrk9aNzPjvotGc8av3uTa/GKm/+BYEpvRH4xItIrKfby2eC35BWW88XE5Vfuco/umcc+Fozh7dC86t0sOu8QaxaKD6dfADe5eVcPBib7uvsrMBgKvmdk8d/+kpoWY2VRgKkDfvn1jUJa0VD1T23HHpJH8+B9F/Omt5fzgpCPCLknkSxau2kp+YSnPFK1i444Kundqy/dPGMjkMZkc2b1j2OXVKRbBnws8EYR+OnCWmVW6+0x3XwXg7svN7A3gKKDG4Hf3R4BHIHKz9RjUJS3YpJzezJq/hmkvf8z4Id0Z0rNT2CVJnNu0o4JnilaSX1jGglVbaZOYwGnDu5M3JosTBqWTlNhyLotqcPC7+4D9j83sMeB5d59pZl2Ane6+x8zSgXHAfQ1dn8QHM+OuC0Yy4VdzuHp6ETOvGEdyC
Dktahsmofc5aWk19Qxj8XrWVvlTOid2d+fu5wJuX0oUtKm7BLPCz1OZ3zcWA8kG5mZcBtQDKAuz98kKcOA/5oZvuIXCH8S3df2OCKJW6kd2zLLy4YxeV/K+Sh15bxk9MHh12SxIll67aTX1jK03NXsm7bHrqmtOGbX+9H3pgshvfuHHZ5DVafs3oure/C3P07UY/fAUYdXlkiERNH9uTCo
w0OvLOG1YD0ZlpoZdkrRSW3fv5fni1eQXlvLRis0kJhgnD8lg8pgsThnavUmuqG0qunJXmr3bzh3BO59s4OrpRTz3o+Npl5wYdknSSuzb57y7fAP5BaXMWrCG3Xv3Mah7R3561lDOP6oP3Tu1C7vERqHgl2YvtUMy904ezbcf/YAHX/mYn541LOySpIUr3biT/MIyniwsY+XmXXRql8RFR2eSl5tFdmZqkw6fEAYFv7QIJw3O4BvH9OW/31rOiN6dGT+kO6ntm+c50tI87ayo5MV5a5hRWMp7yzdiBscfmc71E4cwYUTPuNqTVPBLi/Gzs4bx3icb+M8nigA4IiOFnKwu5PRN46isNIb27NSiTqmTxufuFHy+ifyCUl4oWc2Oiir6devANacP5sIxmXE7GKCCX1qMlLZJPH/V8cz9fDNFpZsoKt3MG0vW8eTcMgDaJScwqk8qOVlpB74Qeqe2a/W77fJVq7fs4qlg+IRP1++gQ5tEzhrVi7wxmYwd0DXu/09Yc7zxRW5urhcUFIRdhrQA7k7Zpl18VLqZohWRL4T5q7YeuJlF905tI18EfdPIyUpjdGYaHdtqe6c12r23ilcWriW/sIx/LS1nn8PYAV3JG5PJWaN6kdLKP3czK3T33Pq0bd3vhLR6ZkZW1w5kde3Aedm9gcj4KYtWb6WodPOBn5cXrgUgwWBQ904cFXwR5PRNY1D3ThoPqIVyd+at3EJ+QRnPFq9iy6699E5txxUnH8nkMZn066ZB/mqiLX6JC5t2VFBUtn+vIPKzZddeAFLaJDIqMzXSPZSVxlF90+jRuXWextdarN++h5kfrSS/oIwla7fRNimBCSN6kpebyXFHpMflF/mhbPEr+CUuuTufbdgZOVawYjMflW5m0eqt7K2K/D30Tm13oHsoJ6sLo/qk0r5N/Jz10RztrdrH64vXkV9YxuuL11G5z8nOSiNvTCbnZveO+7O81NUjUgczY0B6CgPSU7jgqEwg0ke8YFV0F9EmXpy3BoDEBGNoz07BF0Fkr2BgesdmNcZ6a7V4zVZmFJQxs2gl67dXkN6xLd87fgB5YzIZ1EOD9x0ObfGLHMT67Xu+1D1UXLqZbXsqAejULonszC++CHKy0ujWsW3IFbcOm3dW8GzxKvILypi3cgvJicapQ3uQl5vJiYMzNGBfDdTVI9JI9u1zlq/fzkdB91DRis0sWbuNqn2Rv6Osru0PHCvIyUpjRO/OcXVhUENU7XPeWlpOfmEZryxYS0XVPob16kzemEwm5fTWl2odFPwiTWhnRSXzV249cG3BRys2s3rLbgCSE43hvTpHnVLahf7dOsT9eeTRlpdvZ0ZhGU/NXcma
tJ65DM+Tl9mDwmk5F9NChffSn4RUK2dutuPlrxxbGCkrIt7KyoAiCtQ/KBPYL9P2kdWua47odr+55KXiiJdOUUfL6JBIsMy5GXm8Wpw7rTNkl7SYdKwS/SzFTtcz5euy3yRRB8IXy8bhv7
wGpKd86VjB0J6dW9UwwBDpJnv/043kF5by0rw17NpbxcCMFPLGZHHh0X10Cm0DxTz4zexR4BxgnbuPPEi7rwHvARe7+4yPb7qyAAANcUlEQVRg2reBm4Mmd7n7/9a1PgW/xIPteyopKdsctWewmfJtewBok5TAyN6dvzQWUWaX9i2yi6hs006eLFzJjLmllG7cRae2SZyT3Zu83EyOykprka+pOWqM4D8R2A78p
gN7NE4BVgN/Cou88ws65AAZH78jpQCIxx900HW5+CX+KRu7Nqy+4DQ098tGIz81ZuYU8w/ER6xzZR3UNdGJ2VSud2zfPc9V0VVcxesIb8wlLe+WQD7jDuyG7kjcliwoieuiaiEcT8PH53n2Nm/eto9iPgSeBrUdMmAK+4+8agsFeAicDj9VmvSDwxM/qktadPWnvOHt0LiFy0tGTNti+NRfTPReuC9nBkRscvjUU0pEd4I5S6O3NXbGZGYSnPF69m255Ksrq258enDuaiMX3I7NIhlLrkq2JyAZeZ9QEuAE7hy8HfByiN+r0smCYi9ZCcmMDIPqmM7JPKv3+9HwBbdu6luOyL7qFXg6tZAdonJzKqT+qXxiLqldq4Qw+v3bo7GAmzlE/Kd9A+OZEzR/Ukb0wWxwzoqovcmqFYXbn7a+AGd6+q1l9X0ydeY9+SmU0FpgL07ds3RmWJtD6pHZI5cXAGJw7OACJb2qUbd/FR0D1UVLqZP7/9GRVVkS6iHp3bHugeOqpvGqP6pDZ4pMo9lVW8umgd+QWlvPlxZCTM3H5duPeigZw9urdGQG3mYvXp5AJPBKGfDpxlZpVEtvDHR7XLBN6oaQHu/gjwCET6+GNUl0irZ2b07daBvt06MCknskO9p7KKRau3UbRiU6SbqHQzsxd8MULp4B5RI5RmdeHI7h3rNbDZ/JVbmFEYGT5h88699OzcjstPOoLJYzIZmNGxUV+nxE69T+cM+vifP9hZPUG7x4J2+w/uFgJHB7PnEjm4u/Fgy9DBXZHY27ijguLSzQe+CIpWbGLr7sjwEx3bJjE6M/WLg8d90w7caHzjjorISJiFZSxavZU2SQmcMbwHk8dkcsKgjLgcCbM5ivnBXTN7nMiWe7qZlQG3AckA7v5wbc9z941mdifwYTDpjrpCX0QaR9eUNpw8tDsnD+0ORM6
3TDji+NRfTInOVUBsNP9ElrT9+uHSj4fCN7q5zRmancOWkE52b3jrsLzlobXcAlIgfs3lvF/JVbIkNPlG5mefkOxh3RjbzcLIb01EiYzZmGZRaRw9IuOZHc/l3J7d817FKkEbWua8JFRKROCn4RkTij4BcRiTMKfhGROKPgFxGJMwp+EZE4o+AXEYkzCn4RkTjTLK/cNbNyYDOw5TCeng6sj21FUotUDu8zau6a6+sKq67GXm+slx+r5TVkOYf73IbkVz93z6hPw2YZ/ABm9oi7Tz2M5xXU97JlaZjD/Yyau+b6usKqq7HXG+vlx2p5DVlOc8+v5tzV81zYBUidWutn1FxfV1h1NfZ6Y738WC2vIctprv+HgGa8xX+4tMUvIi2VtvgP3yNhFyAicpiaJL9a3Ra/iIgcXGvc4hcRkYNQ8IuIxBkFv4hInImr4Dez883sv83sGTM7I+x6RETqy8wGmtn/mNmMhi6rxQS/mT1qZuvMbH616RPNbImZLTOzGw+2DHef6e7fB74DXNyI5YqIHBCj/Fru7pfFpJ6WclaPmZ0IbAf+4u4jg2mJwMfA6UAZ8CFwKZAI3FNtEd9z93XB86YBf3f3uU1UvojEsRjn1wx3n9yQelrMzdbdfY6Z9a82eSywzN2XA5jZE8Akd78HOKf6MszMgF8CLyn0RaSpxCK/YqnFdPXUog9QGvV7WTCtNj8CTgMmm9nljVmYiEgdDim/zKybmT0MHGVmNzVkxS1mi78WVsO0Wvuu3P23wG8
xwRkXo71PzaAMRkg7Wl
GXAVlRv2cCq0KqRUTkUISWXy09+D8EBpnZADNrA1wCPBtyTSIi9RFafrWY4Dezx4F3gSFmVmZml7l7JXAlMBtYBEx39wVh1ikiUl1zy68WczqniIjERovZ4hcRkdhQ8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMQZBX8LZWZuZn+N+j3JzMrN7Pk6npdjZmcdZH6umTVoWAszyzCz983sIzM7oSHLijUzu8PMTgtp3Z+ZWXoI673fzBaY2f31aNvfzL7RyPVcbmbfasx1yMG19LF64tkOYKSZtXf3XUSGdl1Zj+flALnAi9VnmFmSuxcABQ2s7VRgsbt/u75PMLNEd69q4Hr3LyspuDjmK9z91liso4X5AZDh7nvq0bY/8A3g/xqrGHd/uLGWLfWjLf6W7SXg7ODxpcDj+2eYWUpw84cPgy3vScFl4XcAF5tZkZldbGY/N7NHzOxl4C9mNn7/XoOZdTSzP5vZPDMrMbOLzCzRzB4zs/nB9J9EF2RmOcB9wFnBOtqb2aVB2/lmdm9U2+3BFvj7wLFR04eZ2QdRv/c3s5Lg8a3Ba5of1G3B9DfM7G4zexP4mZl9ambJwbzOwdZ2clD75GD6Z2Z2u5nNDeobGkzPMLNXgul/NLPPq2+pm9kPzey+qN+/Y2a/Cx7PNLPCYCt7avUPLXg986N+v9bMfh48PsLMZgXPfyuqprzgNReb2ZwalmnBlv3+z+XiYPqzQArw/v5pUc85KfiMioL/I52IDFt+QjDtJ8HnfX/wnpeY2Q+C5443szlm9rSZLTSzh83sK3liZr8M5peY2QPBtJ8Hr7l31PqLzKzKzPoF7/+TwTo/NLNx1ZcrDeTu+mmBP0Ru6jAamAG0A4qA8cDzwfy7gW8Gj9OI3PAhhcjdxx6KWs7PgUKgffB79DLuBX4d1bYLMAZ4JWpaWg21HVgH0BtYAWQQ2cN8DTg/mOfAlFpeXxEwMHh8A3Bz8LhrVJu/AucGj98A/hA1789R65kKTAsePwZMDh5/BvwoePwfwJ+Cxw8BNwWPJwZ1plerL4PIWOr7f38JOD66RqA9MB/oFrW+dCJb1fOjnnst8PPg8avAoODxMcBrweN5QJ+DvOcXAa8QuYlHj+A977X
0ot7/FzwLjgccfg8znw+Ue9d/vf+7ZE9gYHBO12AwODdb6y/32Nem5XYAlfjBCQFvV/7tpqba8gMmQBRPY29r+XfYFFYf+9tbYfbfG3YO5eQiRELuWrXTdnADeaWRGRUGxH5I+oJs96pLuoutOA30etbxOwHBhoZr8zs4nA1jrK/BrwhruXe6T75e/AicG8KuDJWp43HZgSPL4Y+Efw+GSLHD+YB5wCjIh6zj+iHv8J+G7w+LtEvghq8lTwbyGR9xLgeOAJAHefBWyq/iR3LweWm9nXzawbMAR4O5h9lZkVA+8RGX1xUC3
hIz6wgcB+QHn9sfgV7B7LeBx8zs+0SCt
jgcfdvcrd1wJvEnnvD+Zt4EEzu4pIKNfUPXYG8K2gnveBblGv5wOP3A6wisje5vHVnruVyJfDn8zsQmBnLa97HPD/gO8Fk04DHgrW+SzQOdgbkRhRH3/L9yzwAJEtsG5R0w24yN2XRDc2s2NqWMaOWpZtVBsf3N03mVk2MIHIVtoUvviDrW0Ztdnttff
4NIAD4VWa0vNbN2wB+AXHcvDbpH2tX0Otz97aBL5SQg0d2/dK/TKPv7vav44u/hYDVXr3EKsBh42t3dzMYTCa5j3X2nmb1RrUaASr7czbp/fgKw2d1zqq/I3S8PPruzgSIzy/HI+Oz71bfm6GX+0sxeAM4C3rOaD3obkb2i2V+aGHmd1Qf6qv5/pdLMxhI55nMJkQHJTqm2nF7A/wDnufv2YHICkfevpo0RiQFt8bd8jwJ3uPu8atNnAz+K6gM/Kpi+Dajv1tPLRP5YCZbRJejrTnD3J4FbgKPrWMb7wElmlm6Re4xeSmRr9KDc/RMiYXwLX2zJ7w/I9cHWcV33Hf0LkS3R2
2a/Mvgr0NMzuDSBdXTZ4CzifymvbXmApsCkJ/KPD1Gp63FuhukTsqtSW4zZ67bwU+NbO8YN0WfMliZke4+/seOTi9ni+P4w4wh8ixm0QzyyCyV/UBBxEsc56730ukC2coX/3/MRv4oX1xvGSwmaUE88ZaZEjhBCJ7Zf+qtvyOQKq7vwj8mMiJBdHzk4ns2d3g7h9Hzar+/+4rX4TSMAr+Fs7dy9z9NzXMuhNIBkqCA4l3BtNfB4YHB9MuruF50e4Cuuw/qAicTOTWcG8Eu+GPAQe9BZy7rw7avA4UA3Pd/Zn6vTr+AXyTSDjg7puB/ybS3z2TyHjmB/N3IqH9eB3tqrsdOMPM5gJnAquJBOKXBF1fC4F+7r4/ZGcBSRY5GH0nke6e6s
S+Qg+/vA80T2GPb7N+Cy4P1eAEwKpt8fHLSdTyTki6st9mmgJJj+GnC9u6+p43X+OOqz3UXkOEUJUBkcRP4JkS6zhcDcYN1/5Is9o3eJHAyeD3wa1BCtE/B88F68Cfyk2vzjiHRH3R51gLc3cBWQGxwQXkiM7jolX9CwzNJqWeTsnUnu/u+H+Ly2QFXQVXEs8F81d
Es6Cr51p3b9SbgkvjUB+/tEoWObXyTCL914eqLzA96MKoAL4fy9pEwqYtfhGROKM+fhGROKPgFxGJMwp+EZE4o+AXEYkzCn4RkTij4BcRiTP/H9e3Sn2HGX8eAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying values of step size')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L2 regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"egularization has the effect of penalizing model complexity in the form of an additional loss term that is a function of the model weight vector. L2 regularization penalizes the L2-norm of the weight vector, while L1 regularization penalizes the L1-norm."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"[1.463373357714063, 1.4627638795194882, 1.457389998406437, 1.414347928269498, 1.4006915016046428, 1.5458042588519074, 1.8520326400407603]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEOCAYAAABy7Vf3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X10HHd97/H3V8+yHm1JtmPJsWPy7KeEGEgCKSmhJFCgEB5CSqEhbXPb23Lvuee0l/Yc2rT0tkC5PYdLU5oG6phQMFBICVAeb9uQGychOGm0Vp6oiZ9WjmM5jtaSbOth9b1/zEheKavVWlrt7M5+Xufs0Wpndua7o9FnZn8z8xtzd0REJF6qoi5AREQKT+EuIhJDCncRkRhSuIuIxJDCXUQkhhTuIiIxpHAXEYkhhbuISAwp3EVEYkjhLiISQzVRzbizs9PXr18f1exFRMrSY489dszdu+YbL7JwX79+Pbt3745q9iIiZcnMDuQznpplRERiSOEuIhJDCncRkRhSuIuIxJDCXUQkhhTuIiIxpHAXESmiHzx5hP3HRpZ8Pgp3EZEiGZ1I83tffpydPz245PNSuIuIFMkzzw8xnna29rQv+bwU7iIiRZJIDgKwpadtyeelcBcRKZLeZIqOpjq62xuXfF4KdxGRIkkkB9nS04aZLfm8FO4iIkUwMjrB3qPDbClCezso3EVEiqKvP8Wkw9a1S9/eDgp3EZGiSCRTANpzFxGJk97kIN3tjXQ21xdlfgp3EZEiSCRTRTkFcorCXURkib00MsbB4yeL1iQDCncRkSWX6A/a27dqz11EJD4Sh4IrUzcp3EVE4qM3mWJDVxOtDbVFm6fCXURkiSWSg0XpLCyTwl1EZAkdSZ3m6NBoUc+UAYW7iMiS6p3uCVJ77iIisbEnmaKmyti4prWo81W4i4gsod7kIBeuaqGhtrqo85033M1su5kdNbO+OYa3mdm3zazXzJ40sw8VvkwRkfLj7uzpL+6VqVPy2XPfAdyQY/jvAk+5+1bgWuCvzaxu8aWJiJS3g8dPMnhyvOjt7ZBHuLv7A8DxXKMALRb0Pt8cjjtRmPJERMpX73RPkMXfc68pwDTuAL4FHAZagJvcfbIA0xURKWuJQ4PU11Rx0eqWos+7EAdUrweeANYAlwF3mFnWw8JmdpuZ7Taz3QMDAwWYtYhI6UokU1y6ppXa6uKfu1KIOX4IuNcDe4F9wMXZRnT3u9x9m7tv6+rqKsCsRURKU3rS6TucKvqVqVMKEe4HgesAzGwVcBHwXAGmKyJStvYeHebkWDqS9nbIo83dzHYSnAXTaWZJ4HagFsDd7wT+HNhhZnsAAz7i7seWrGIRkTIQ1ZWpU+YNd3e/eZ7hh4E3FawiEZEYSCQHaamvYUNnUyTz1xWqIiJLIJFMsam7jaoqi2T+CncRkQIbnUjz9PMn2LI2mvZ2ULiLiBTcM88PMZ72yM6UAYW7iEjBJaYPpmrPXUQkNnqTKTqa6uhub4ysBoW7iEiBJZKDbOlpI+hyKxoKdxGRAhoZnWDv0eHIzm+fonAXESmgvv4Ukw5bIzxTBhTuIiIFlZju5ld77iIisdGbHKS7vZHO5vpI61C4i4gUUCIZzW31ZlO4i4gUyEsjYxw8fjLyJhlQuIuIFEyiP2hv36o9dxGR+EgcCq5M3aRwFxGJj95kig1dTbQ21EZdisJdRKRQEsnBSDsLy6RwFxEpgCOp0xwdGi2JM2VA4S4iUhBR31ZvNoW7iEgBJJKD1FQZG9e0Rl0KoHAXESmIRDLFhataaKitjroUQOEuIrJo7k4imYq8s7BMCncRkUU68OJJUqfGS6a9HRTuIiKL1lsCt9WbTeEuIrJIiWSK+poqLlzVEnUp0xTuIiKLlEgOsnFNK7XVpROppVOJiEgZmkhP0td/oqTa20HhLiKyKHsHhjk1ni6pM2VA4S4isiiJQ6VxW73ZFO4iIovQmxykpb6G8zqaoi5lBoW7iMgiJJIpNve0UVVlUZcyg8JdRGSBRifSPHOk9A6mgsJdRGTBnn5+iPG0l9TFS1MU7iIiC5QowStTpyjcRUQWqPdQio6mO
G6Mu5WUU7iIiC7Snf5AtPW2YldbBVMgj3M1su5kdNbO+OY
gZk9ET76zCxtZisKX6qISOkYGZ1g79HhkjyYCvntue8AbphroLt/yt0vc/fLgD8CfuzuxwtUn4hISe
TzHplNyVqVPmDXd3fwDIN6xvBnYuqiIRkTKQSJbmlalTCtbmbmbLCPbwv1GoaYqIlKre5CDd7Y10NtdHXUpWhTyg+jZgV64mGTO7zcx2m9nugYGBAs5aRKS4EslUSZ4COaWQ4f4+5mmScfe73H2bu2
6uoq4KxFRIrnpZExDh4/WbJNMlCgcDezNuD1wH2FmJ6ISClL9Aft7VtLeM+9Zr4RzGwncC3QaWZJ4HagFsDd7wxHeyfwQ3cfWaI6RURKRuJQcGXqpnIOd3e/OY9xdhCcMikiEnu9yRQbuppobaiNupQ56QpVEZGzlEgOsrWE29tB4S4iclaOpE5zdGi0pM+UAYW7iMhZ6Z3uCVJ77iIisZFIDlJTZWxc0xp1KTkp3EVEzkIimeLCVS001FZHXUpOCncRkTy5O4lkqmQ7C8ukcBcRydOBF0+SOjVe8u3toHAXEcl
wnfVm82hbuISJ4SyRT1NVVcuKol6lLmpXAXEclTIjnIxjWt1FaXfnSWfoUiIiVgIj1JX/+JsmhvB4W7iEhe9g4Mc2o8XRZnyoDCXUQkL4lDpX1bvdkU7iIieehNDtJSX8N5HU1Rl5IXhbuISB4SyRSbe9qoqrKoS8mLwl1EZB6jE2meOVI+B1NB4S4iMq+nnx9iPO0lfVu92RTuIiLzSExdmbpWe+4iIrHReyhFZ3Mda9oaoi4lbwp3EZF5JJKDbOlpx6w8DqaCwl1EJKfh0Qn2DgyXRWdhmRTuIiI59PWncKfkb4g9m8JdRCSHRBl185tJ4S4ikkNvMkV3eyMdzfVRl3JWFO4iIjkkkoNl01lYJoW7iMgcjo+Mcej4qbK6MnWKwl1EZA7l2t4OCncRkTklkkE3v5u6Fe4iIrGRSA6yoauJ1obaqEs5awp3EZEs3J3eZKrszm+fonAXEcniyInTDAyNlmV7OyjcRUSy6i2z2+rNpnAXEckikRykpsrYuKY16lIWROEuIpJFIpniwlUtNNRWR13KgijcRURmcfeyvTJ1yrzhbm
zeyomfXlGOdaM3vCzJ40sx8XtkQRkeI68OJJTpyeKNv2dshvz30HcMNcA82sHfgs8HZ33wi8pzCliYhEo7eMr0ydMm+4u/sDwPEco/wqcK+7HwzHP1qg2kREIpFIpqivqeLCVS1Rl7JghWhzvxBYbmb3m9ljZvbBAkxTRCQyieQgG9e0UltdvoclC1F5DXAF8MvA9cAfm9mF2UY0s9vMbLeZ7R4YGCjArEVECmsiPUlf/4mybm+HwoR7Evi+u4+4+zHgAWBrthHd/S533+bu27q6ugowaxGRwto7MMyp8XRZnykDhQn3+4BrzKzGzJYBrwGeLsB0RUSKLlHmV6ZOqZlvBDPbCVwLdJpZErgdqAVw9zvd/Wkz+z6QACaBz7v7nKdNioiUst7kIC31NZzX0RR1KYsyb7i7+815jPMp4FMFqUhEJEKJZIrNPW1UVVnUpSxK+R4KFhEpsNGJNM8cKf+DqaBwFxGZ9vTzQ4ynna1lfPHSFIW7iEho+p6pa7XnLiISG72HUnQ217GmrSHqUhZN4S4iEkokB9nS045ZeR9MBYW7iAgAw6MT7B0YLuvOwjIp3EVEgL7+FO6U7Q2xZ1O4i4iQcTBVe+4iIvHRm0zR3d5IR3N91KUUhMJdRATK
Z6syncRaTiHR8Z49DxU7G4MnWKwl1EKl7c2ttB4S4iQiKZwgw2dyvcRURiI5EcZENnEy0NtVGXUjAKdxGpaO5ObzIVm/PbpyjcRaSiHTlxmoGh0Vi1t4PCXUQqXO/UbfVi0BNkJoW7iFS0RHKQmirj0nNaoy6loBTuIlLRHvr5i1x8TgsNtdVRl1JQCncRqVi9hwZ54tAgN17eE3UpBadwF5GKdfeufTTX1/CebQp3EZFYeOHEab6TeJ73bOuJ1fntUxTuIlKRvvjwAdLu3HL1+qhLWRIKdxGpOKfH03z50YO88ZJVrOtoirqcJaFwF5GKc98T/RwfGePW154XdSlLRuEuIhXF3dn+4H4uXt3ClRtWRF3OklG4i0hFeejnL/LsC0Pc+
zMLOoy1kyCncRqSjbH9xHR1Mdb9+6JupSlpTCXUQqxr5jI/zbs0d5/5XrYndF6mwKdxGpGF94aD81VcavXXlu1KUsOYW7iFSE1Klxv
7EG
soaVLQ1Rl7PkFO4iUhH+afchTo6l+VCMT3/MpHAXkdhLTzo7HtrPq9evYHPMbsoxF4W7iMTej556geRLp7j1deujLqVoFO4iEnvbd+2jZ3kjv3Tp6qhLKZp5w93MtpvZUTPrm2P4tWaWMrMnwsefFL5MEZGF6etP8ei+49xy9Xqqq+J70dJsNXmMswO4A7gnxzj/z93fWpCKREQKaPuufSyrq+Y929ZGXUpRzbvn7u4PAMeLUIuISEEdHTrNt3sP854remhrjF+f7bkUqs39KjPrNbPvmdnGAk1TRGRRvvTIQcbTzi0VcvpjpnyaZebzOLDO3YfN7C3AN4ELso1oZrcBtwGce278rxATkeicHk/zpZ8c4LqLV3JeZzz7bM9l0Xvu7n7C3YfD598Fas2sc45x73L3be6+raura7GzFhGZ07d7D3NseIxbX1d5e+1QgHA3s9UW9ptpZq8Op/niYqcrIrJQ7s72Xfu5aFULV7+iI+pyIjFvs4yZ7QSuBTrNLAncDtQCuPudwLuB3zGzCeAU8D539yWrWERkHo88d5ynnz/BJ27cHOs+23OZN9zd/eZ5ht9BcKqkiEhJuHvXPpYvq+Udl3dHXUpkdIWqiMTKwRdP8qOnX+D9r4l/n+25KNxFJFZ2PLSfajM+cNW6qEuJlMJdRGJj6HTQZ/svbzmHVa3x77M9F4W7iMTGP+1OMjw6UTF9tueicBeRWEhPOl94eD9XrFvOZWvboy4ncgp3EYmFf3vmKAdePMmt2msHFO4iEhPbH9zHmrYGrt+4KupSSoLCXUTK3lOHT/Dwcy/ywavXU1OtWAOFu4jEwN279tFYW837XlVZf
nonAXkbJ2bHiU+3oP864rumlfVhd1OSWjEF3+Lsh/vjDMDZ9+YMHvr62uoq6mitpqo66mmrpqC3+vom56WBX1U69l/JwxbsbP+uoqamuC958ZN3M+Z6ZfVUG36xIpZV/+yUHGJia55WodSM0UWbjX1VSxrmPZgt7rDuPpScbTztjEJCdOjTM2MclYepLx9CRjE8HP0Ykzv08WuCuz2mqbuXHI2BjU1ljWDcTMDUzGRqO6evo9dbM3LjUzN1ZT78m+0dKGRyrL6ESaLz5ygGsv6uL8lc1Rl1NSIgv3dR3L+PsPbCva/NKTPr0BGMsI
GMn+OzNhBj4cZjrg3G9LD0JGMT
JpjE5MMjw6MWsazujEJGMT6WDjlJ4kXeAtT03VzA1P/exvHrO/1czzzaa+ppoLVjazpaeNlRV+1Z+Uln9JPM/A0KhOf8wisnAvtuoqo7GumkZKryOh9KRnbCSybUx8xsZkdNY4MzdQzlg6feY9szdEGa+NjE5kvGcy67ef8fTMDc+q1nq29LSzpbuNzT1tbOlpZ0WT2jml+II+2/dx/spm
kg6/2BKlrFhHspq64yqquqS7IHu8lJ5+R4mmePnKD3UIo9/SkSyUH+79MvMNVrf8/yRraEQb+lu41NPW20NlTWzYil+HYfeIm+/hP8xTs3VWyf7bko3CWnqiqjub6GK9at4Ip1K6ZfHzo9Tl
Cfb0D9KbTLEnmeK7e45MD9/Q2cTmnjY2d7exdW07G9e0sqxOq5sUzvYH99HWWMuNl/dEXUpJ0n+bLEhLQy1XvaKDqzJuYfbSyBh7+s/s3T+67zj3PXEYgCqD81c2B3v3Yehfck5rSX5bkdJ36PhJfvDkEf7L619BY53WoWwU7lIwy5vq+IULu/iFC8/c/Pzo0Gn2JFMkkkHo3
sUb7+WBIIDvxetLpluklnc3cbF61uoVZXGMo87nl4P2bGByu8z/ZcFO6ypFa2NHDdJQ1cd0nQ34e783zqNIlksHe/pz9oztn56CEgOEX20nNap/fut65t5xVdzVTr9E4JjYxO8JWfHuLNm1ZzTltj1OWULIW7FJWZsaa9kTXtjdywaTUQBP7B4yen9+57Dw3yjceS3PPwAQAaa6vZ1N06o0lnfUeTzuevUN94PMnQ6QlufZ1Of8xF4S6RMzPWdTSxrqOJt21dAwRn6Tx3bIREcnA69L/0kwP8w4OTALQ01LA5PB1za9ik07O8UWdNxNzkpHP3rv1ctradV567POpySprCXUpSVZVx/spmzl/ZzI2vDM6GmEhP8p9Hh9mTTNEbNulsf3Df9Ln4K5rq2NzdNqNJp9JvtRY39
sKPuOjfCZmy+PupSSp3CXslFTXcUl57RyyTmtvDfs/W90Is2zR4am2/ATyRSfvf/Y9FW/K1vqzxyw7WljS3cbHc31UX4MWYTtD+5ndWsDbw6b9GRuCncpa/U11WFbfDsQnDlxaizNU8+HZ+iEe/n/+szR6YuuutuDi66CsA9Cv61RF12VumePDPHg3mP8wfUX6YyqPCjcJXYa66qzXnT15OETM9rwv9d35qKr9R3LZhyw3dTdRlO9/j1KyY6H9lFfU8WvvvrcqEspC1p7pSK0NNRy5YYOrtxw5qKrwZNj4QVXQZPO7v3H+VZvcNGVGZzf1TzdlLNlbTuX6qKryBwfGePex/u58ZU9LFdfRnlRuEvFal9WxzUXdHHNBWcuuhoYGmVP/+B0k84DPzvGvY/3A0EfQBeuamFrRpPORatbqKtRE8FS2/noQUYnJrn1teujLqVsKNxFMnS11POGi1fxhovPXHR15MTpGQdsv
kEb7y0/Ciq+oqLjmnZbqHzC09bZzf1az7eBbQ2MQk9zy8n2su6OSCVS1Rl1M2FO4iOZgZ57Q1ck5bI9dvPHPR1aHjp0iEe/iJ5CDf/I/D/OMjB4HgoquNa1rDwA9C/zxddLVg3+t7nhdOjPKJG7dEXUpZUbiLnCUz49yOZZzbsYy3bpl50dWe6cBPsfPRg9y9K7joqrm+hk3drcEFV2GTztoVuuhqPu7O9gf3saGziddn9Fkk81O4ixRA5kVX77z8zEVXeweGSRxKkegfZE8yxd279jOWDgK/fVnt9EVXU006q1sbFPgZHj8YdCn957+yUd98zpLCXWSJ1FRXcfHqVi5ePfOiq58dGQ6adA6lSPSnuPPHz01fdNXVUs+W7jbWdTTR1VJPZ3MdXS3104+OpvqK6kRt+659tDbUTF+lLPlTuIsUUX1NdXATk5423v+a4LXT42mePHyCPclBEv3BWTqPPPciI2Ppl73fDDqa6uhsPhP4XbOed4Y/25fVlvW3gMODp/h+3xF+83Xn6ZqDBdASE4lYQ201V6xbzhXrZnaENTI6wbHhUQaGRqd/DgyNMjA8ysDQGAPDozw3MMLA0Oh0U0+m2mqjs7n+zIYgYyMwY+PQUk9TXXXJbQimegX94NXroy2kTCncRUpUU30NTfU1rOtoyjmeu3Pi1EQY+qPTPzM3CEdSp+nrT3FseJRJf/k0GmqrZmwAsn0zmHqtGBdynRybYOejB7l+4yq629Vn+0LMG+5mth14K3DU3TflGO9VwCPATe7+9cKVKCK5mBlty2ppW1bL+Subc46bnnReOjn28m8DU78Pj7Lv2AiP7jvOSyfHs06jpaFm5gYgyzeDrpZ6VjTVLbgPmHsf7yd1apxbX6s+2xcqnz33HcAdwD1zjWBm1cAngR8UpiwRWQrVVWeaauYznp7kxeGx8NvAaY6FTUGZzUNPHz7BA0OjDI1OvOz9ZrB8Wd2s5qCMA8TNDXS2BMOXL6ubPhsm6LN9H1t62l7WVCX5mzfc3f0BM1s/z2gfBr4BvKoANYlICaitrmJ1WwOr2xqAtpzjnh5Pz2gSynac4MCBEY6eGGV04uXHB4KNTnCguKm+hp8PjPDpmy4rueMA5WTRbe5m1g28E3gD84S7md0G3AZw7rnq2U0kLhpqq1m7YhlrVyzLOZ67Mzw6kbEBGGNg6HTGcYLgm8Jrz+/gLZvPKVL18VSIA6qfBj7i7un5trLufhdwF8C2bduyHNYRkTgzM1oaamlpqGVDV+7jA7I4hQj3bcBXwmDvBN5iZhPu/s0CTFtERBZg0eHu7tOHs81sB/AdBbuISLTyORVyJ3At0GlmSeB2oBbA3e9c0upERGRB8jlb5uZ8J+butyyqGhERKQjdUUBEJIYU7iIiMaRwFxGJIYW7iEgMmXs01xKZ2QAwCKRmDWrL47VO4NjSVfcy2WpayvfnM36uceYalu
2cYr5jJf7PI+22ks1fKea1ilr+OLXd65hlfC8l7n7vPfc9DdI3sAdy3kNWB31HUu5fvzGT/XOHMNy/f1Of4GRVvmi13eZzuNpVreOZZlRa/ji13euYZXyvLO5xF1s8y3F/FaMS12/mf7/nzGzzXOXMPyfb3cl/fZTmOplvdcwyp9HV/s8s41vFKW97wia5ZZDDPb7e7boq6jkmiZF5eWd3HFcXlHvee+UHdFXUAF0jIvLi3v4ord8i7LPXcREcmtXPfcRUQkB4W7iEgMKdxFRGIoduFuZu8ws8+Z2X1m9qao64k7M9tgZv9gZl+Pupa4MrMmM/tCuF6/P+p6KkEc1uuSCncz225mR82sb9
N5jZs2a218z+MNc03P2b7v5bwC3ATUtYbtkr0PJ+zt1/Y2krjZ+zXPY3Al8P1+u3F73YmDibZR6H9bqkwh3YAdyQ+YKZVQN/C7wZuBS42cwuNbPNZvadWY+VGW/9aPg+mdsOCre85ezsIM9lD/QAh8LR0kWsMW52kP8yL3uFuIdqwbj7A2a2ftbLrwb2uvtzAGb2FeBX3P3jwFtnT8OCm7l+Avieuz++tBWXt0Isb1mYs1n2QJIg4J+g9HbIysZZLvOniltd4ZXDitLNmb0WCFb07hzjfxh4I/BuM/vtpSwsps5qeZtZh5ndCVxuZn+01MXF3FzL/l7gXWb2d0R/2XzcZF3mcVivS2rPfQ6W5bU5r7xy988An1m6cmLvbJf3i4A2ooWRddm7+wjwoWIXUyHmWuZlv16Xw557Elib8XsPcDiiWiqBlnd0tOyLL7bLvBzC/afABWZ2npnVAe8DvhVxTXGm5R0dLfvii+0yL6lwN7OdwMPARWaWNLPfcPcJ4PeAHwBPA19z9yejrDMutLyjo2VffJW2zNVxmIhIDJXUnruIiBSGwl1EJIYU7iIiMaRwFxGJIYW7iEgMKdxFRGJI4b4EzMzN7IsZv9eY2YCZfWee911mZm/JMXybmS2qawUz6zKzn5jZf5jZNYuZVqGZ2cfM7I0FmtZ+M+ssxLQKOU0zu9jMngiX/yvmm76Zvd/MEuHjITPbupj5L8RCPreZfX4hvSua2S1mtmax05Hy6FumHI0Am8ys0d1PAb8E9OfxvsuAbcB3Zw8wsxp33w3sXmRt1wHPuPuv5/sGM6t294J0NRt+jolsw9z9TwoxjxL3DuA+d789z/H3Aa9395fM7M3AXcBrcr2hkH+vhQjn/5sLfPstQB9hFwCLmI64ux4FfgDDwF8C7w5/vwf4CPCd8PcmYDvBpc
QdDFaB1wEBgg6Nr1JuBPCf6Zfwh8Gbg2YxrNwN3AHiABvAuoJuizui98/X/MquuyWfNoBG4Ox+0DPjnrM3wM+AnwuozXLwEezfh9PZAIn/9J+Jn6wrqnLpK7P1wePwZuJwis2nBYK7AfqA1rn1pm+4E/Ax4P67s4fL0L+FH4+t8DB4DOLH+D/VOvA78GPBp+5r8Pl9PvAH+VMf4twN/MNX7mNMO/378AveFnvSnL/C8DHgn/Nv8MLAfeAhwh2ND/e66a51ivlgP9Oda56b8XcEW4vB8juPrynHC8V4U1PQx8CujL+Px3ZEzvO8C1WZblN8NpPgnclmP+9xPsqLw9XI5PAM8C++ZaV4B3h9N5ljPr5/3AtvA9udbVvwj/Ho8Aq6LOgFJ4RF5AHB/hyrYF+DrQEK6o13ImmP8S+LXweTvwszAwZv+D/Wn4j9QY/p45jU8Cn84Yd3n4D/2jjNfas9Q2PQ9gDUHYdxF8i/s34B3hMAfeO8fnewLYED7/CPDR8PmKjHG+CLwtfH4/8NmMYXdnzOc24K/D5zuYGe4fDp
V+Dz4fM7gD8Kn98Q1jlnuBNsjL7NmY3JZ4EPhp95b8b43yMIpazjz5rmu4DPZby3Lcv8EwR73BCE3qcz/qa/P8dy3Z/ts2QM
2p5ZBl2PTfi2BD+RDQFf5+E7A9fN4HXB0+/wRnH+4rwp+N4bQ6sq0vZIRyxmtfA343j3Vl2+zpMP+6OvX+vyJcHyv9oTb3JeLuCYK92pt5eTPLm4A/NLMnCFbeBuDcOSb1LQ+admZ7Ixl3mnL3l4DngA1m9jdmdgNwYp4yXwXc7+4DHjSVfAn4hXBYGvjGHO/7GvDe8PlNwFfD578YtufvAd4AbMx4z1cznn+eM13Yfogg7LO5N/z5GMGyhCCAvwLg7t8HXp
w4WuI9jo/TRc3tcRbJgGgOfM7Eoz6wAuAnbNNf6sae4B3mhmnzSza9w9lTnQzNoINqw/Dl/6AmeW64KY2S8Cv0GwMc0m8+91EbAJ+FH4GT4K9JhZO9Di7g+F4315AaX8NzOb2kNeC1yQZf7Z6v+fwCl3n1pnc60r2eRaV8cINkYwc12paGpzX1rfAv43wR53R8
BrzL3Z/NHNnMsrWljswxbWNWP+setMtuBa4HfpcggG/NUV+2vqynnPa5222/CvyTmd0bzN
08waCPZyt7n7ITP7U4KN1ss+h7vvMrP1ZvZ6giaPGfe0zDAa/kxzZl3NVXM2BnzB3bPdcOGrBMvoGeCf3d3DO3nNNf5U/T8zsysImlkBrP1WAAAC2klEQVQ+bmY/dPePnWVdeTOzLQQbxDd70M94Npl/LwOedPerZk1neY7ZTDDzBIuG2SOY2bUEOxVXuftJM7s/Y7w51xczuw54D2EY57GuZJ1MjmHjHu62M3NdqWjac19a24GPufueWa
APhwGCSY2eXh60NAS57T/iFBb3aE01gentFQ5e7fAP4YeOU80/gJ8Hoz67TgXpI3E7TT5uTuPyf4J/pjzuyRT/1zHjOzZoL201zuAXYy9177XB4k/NZgZm8iaI7K5V8J7sq1MnzPCjNbFw67l+AA582c+Ry5xid8bQ1w0t3/kWDjPWM5h3vyL2WcjfQB8liu2ZjZuWGdH3D3n+X5tmeBLjO7KpxGrZltDL/dDZnZleF478t4z37gMjOrMrO1BLefm60NeCkM9ouBK7OMM7v+dQRB/t6Mb6C51pW5/gcWtK5WMm3hlpC7J4H/k2XQnwOfBhJhwO8nuD/pv3Omuebj80z+fwF/a8Gd3NMEBx9/DtxtZlMb7Zy3B3P35y24hdi/E+wZfdfd78vnsxGE4aeA88JpDZrZ5wiaLPYTHCzL5UvhZ9iZ5/ym/Bmw08xuIvjnfp4gELJy96fM7KPAD8PlMk7wreZA+E3nKeBSd390vvEzJrsZ+JSZTYbDfyfLrH8duNPMlhE0l+V7J6VEOF0Imr9aCb71fTbcF5hw9225JuDuY2b2buAzYRNRDcH69iRB087nzGyEoElwqklpF8GB7qkDltnuP/x94LfNLEGwAXkkj89zS1j/P4f1H3b3t+RYV3YQLLdTwPQ3j0WuqxVJXf5KJMLw+RV3/8BZvq8eSLv7RLhn+nfuftmSFBlDZtbs7sPh8z8kOIvmv0dcliwB7blL0ZnZ3wBvJmizPlvnAl8L96rHgN8qZG0V4JfDPeAagm8jt0RbjiwV7bmLiMSQDqiKiMSQwl1EJIYU7iIiMaRwFxGJIYW7iEgMKdxFRGLo/wMrlHM1Wt/pRQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L2 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L1 regularization"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"[1.463373357714063, 1.4633409680931317, 1.4630506454349392, 1.4603658739928238, 1.4355688529629576, 1.7677660966171576, 4.800777158151935]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l1', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEOCAYAAABy7Vf3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt4XPV95/H3Rxff77ZsGV8whIvBFyA4BHKDAAEDNunzQBuymzR0k+VJNm2SfZrtljbNNuxu26TbNpumaUpCGpKmCVmSNkjcCTEJJEAMQbKNDTFgwJFky/e7rct3/5gje5BH0kia0dHMfF7PI/vMmd+c8/3NjD5z9JtzUURgZmblpSrtAszMrPAc7mZmZcjhbmZWhhzuZmZlyOFuZlaGHO5mZmXI4W5mVoYc7mZmZcjhbmZWhhzuZmZlqCatFc+aNSsWLVqU1urNzErSM888syMi6gZql1q4L1q0iLVr16a1ejOzkiTp1XzaeVjGzKwMOdzNzMqQw93MrAw53M3MypDD3cysDDnczczKkMPdzGwEPbC+jZY9h4u+Hoe7mdkI2Xu4g09891fc8fgrRV+Xw93MbIQ8/Pw2jnV1s2r53KKvy+FuZjZCGppamD99POcvmFb0dTnczcxGwK6Dx3hi8w5WLT8FSUVfn8PdzGwEPLC+jc7uYPV5xR+SAYe7mdmIaGxu4fRZEzl37pQRWZ/D3cysyLbvP8KTL+9k1XkjMyQDDnczs6K7f10b3QGrR2AvmR4OdzOzImtoamFx/WTOnDN5xNbpcDczK6KWPYdZ++ruEdm3PZvD3cysiO5tbgVg1fJTRnS9DnczsyJqbG5h2bypLJo1cUTX63A3MyuSV3cepGnr3hHbtz2bw93MrEgakyGZ60Z4SAYGEe6SqiX9SlJjjvtultQu6bnk5yOFLdPMrPQ0NLVw4anTmTdt/IivezBb7p8ENvZz/10RcX7y8/Vh1mVmVtI2bz/Ap
9I76XTI+8wl3SfOA6wKFtZpaHxuYWJLhu2SgOd+CLwB8B3f20uUFSs6S7JS0YfmlmZqUpImhoauGtp81g9pRxqdQwYLhLWgVsj4hn+mnWACyKiOXAI8CdfSzrFklrJa1tb28fUsFmZqPdp
9vNR+kNXnjfwXqT3y2XJ/O3C9pC3A94DLJf1LdoOI2BkRR5ObXwMuzLWgiLg9IlZExIq6urphlG1mNno1NLVQXSWuWZrOkAzkEe4RcWtEzI+IRcBNwKMR8YHsNpKye3A9/X/xamZWtiKCxuZW3vammcyYOCa1Ooa8n7uk2yRdn9z8hKQNkpqATwA3F6I4M7NS07x1L6/tOpTqkAxAzWAaR8QaYE0y/dms+bcCtxayMDOzUtTY3EJttbj63PpU6/ARqmZmBdLdnRmSufSsOqZOqE21Foe7mVmBPPvablr3HhnxM0Dm4nA3MyuQhqYWxtZUceW5c9IuxeFuZlYIXd3BvevauHzxbCaNHdTXmUXhcDczK4CnXt7JjgNHU99LpofD3cysABqaW5kwppp3nz077VIAh7uZ2bB1dHVz
pW3nPuHMaPqU67HMDhbmY2bE9s3sGeQx2jYi+ZHg53M7NhamhqZfK4Gt511qy0SznO4W5mNgxHO7t4aEMbVy+pZ2zN6BiSAYe7mdmwPPZCO/uPdo6avWR6ONzNzIahsbmV6RNqedubZqZdyhs43M3MhujwsS4e2biNa5bNpbZ6dMXp6KrGzKyEPLppO4eOdaV2Eez+ONzNzIaosbmFusljeetpo2tIBhzuZmZDcuBoJ49u2s51y+ZSXaW0yzmJw93MbAgeeX4bRzu7R+WQDDjczcyGpKGphVOmjuPNC6enXUpODnczs0Hae6iDn/66neuWz6VqFA7JgMPdzGzQHtzQRkdXjLoDl7I53M3MBqmhuYWFMyawbN7UtEvpU97hLqla0q8kNea4b6ykuyRtlvSUpEWFLNLMbLTYeeAoP39pJ6vPm4s0OodkYHBb7p8ENvZx34eB3RFxBvB3wOeHW5iZ2Wh0
o2urpjVJ3eN5e8wl3SfOA64Ot9NHkvcGcyfTdwhUbzR5qZ2RA1NLVwxuxJLK6fnHYp/cp3y/2LwB8B3X3cPw94HSAiOoG9wEmHbEm6RdJaSWvb29uHUK6ZWXq27TvC01t2sWr56B6SgTzCXdIqYHtEPNNfsxzz4qQZEbdHxIqIWFFXVzeIMs3M0ndvcysRjPohGchvy/3twPWStgDfAy6X9C+92mwFFgBIqgGmArsKWKeZWeoam1s4Z+4Uzpg9Ke1SBjRguEfErRExPyIWATcBj0bEB3o1uwf4UDJ9Y9LmpC13M7NStXX3IZ59bQ+rzxudpxvorWaoD5R0G7A2Iu4B7gC+LWkzmS32mwpUn5nZqHBvcysAq5aN/iEZGGS4R8QaYE0y/dms+UeA3y5kYWZmo0lDcwvnLZjGwpkT0i4lLz5C1cxsAFt2HGT9
axepSeATIXh7uZ2QAam1sAuM7hbmZWPhqaWnnLounMnTo+7VLy5nA3M+vHi9v288K2/SWxb3s2h7uZWT8am1qoElyzrD7tUgbF4W5m1oeIoLG5lYtPn8nsyePSLmdQHO5mZn3Y0LKPl3ccHNUX5eiLw93MrA+Nza3UVImVS0prSAYc7mZmOWWGZFp4x5mzmD5xTNrlDJrD3cwsh+de38PW3YdLbi+ZHg53M7McGppaGVNdxVVL5qRdypA43M3MeunuDu5d18KlZ9cxZVxt2uUMicPdzKyXX27ZxbZ9R0tyL5keDnczs14am1sZV1vFFYtnp13KkDnczcyydHZ1c9+6Vq44Zw4Txw75khepc7ibmWV58uVd7Dx4rKRO75uLw93MLEtDUwuTxtZw2dmlOyQDDnczs+OOdXbzwIY23nPuHMbVVqddzrA43M3MEo9vbmfv4Y6SuQh2fxzuZmaJxqZWpo6v5R1n1KVdyrANGO6Sxkl6WlKTpA2SPpejzc2S2iU9l/x8pDjlmpkVx5GOLh56fhsrl9Qzpqb0t3vz2c/nKHB5RByQVAs8Lun+iHiyV7u7IuL3C1+imVnxrXmhnQNHO1lVBkMykEe4R0QAB5KbtclPFLMoM7OR1tjcwsyJY7jk9Jlpl1IQef3tIala0nPAduDhiHgqR7MbJDVLulvSgoJWaWZWRIeOdfLjjdu5Zlk9NdWlPyQDeYZ7RHRFxPnAfOAiSUt7NWkAFkXEcuAR4M5cy5F0i6S1kta2t7cPp24zs4L58cbtHO7oKtnT++YyqI+oiNgDrAFW9pq/MyKOJje/BlzYx+Nvj4gVEbGirq70v402s/LQ0NTCnCljecuiGWmXUjD57C1TJ2laMj0euBLY1KtN9jcQ1wMbC1mkmVmx7DvSwZoX27l22Vyqq5R2OQWTz94yc4E7JVWT+TD4fkQ0SroNWBsR9wCfkHQ90AnsAm4uVsFmZoX08IZtHOvsLunT++aSz94yzcAFOeZ/Nmv6VuDWwpZmZlZ8jc0tzJs2ngsWTEu7lIIqj6+FzcyGYPfBY/zs1ztYdd5cpPIZkgGHu5lVsAc3tNHZHawuo71kejjczaxiNTS3cNqsiSw5ZUrapRScw93MKlL7/qP84qWdrFpefkMy4HA3swp1
pWuoOy20umh8PdzCpSY1MrZ82ZxFlzJqddSlE43M2s4rTuPczTW3aV5RepPRzuZlZx7m1uBWBVmQ7JgMPdzCpQQ3MrS+dN4bRZE9MupWgc7mZWUV7fdYim1/eU1Rkgc3G4m1lFaUyGZK5bVh5XXOqLw93MKkpDUwsXLJzGghkT0i6lqBzuZlYxXmo/wPOt+8p+SAYc7mZWQRqbWpHKf0gGHO5mViEigobmFt6yaAb1U8elXU7ROdzNrCK8sG0/m7cfKNvTDfTmcDezitDY1EqV4Jql9WmXMiIc7mZW9nqGZN5+xixmTRqbdjkjwuFuZmVv/W/28erOQ6xaXv5fpPZwuJtZ2WtobqG2Wly9pDKGZCCPcJc0TtLTkpokbZD0uRxtxkq6S9JmSU9JWlSMYs3MBqu7O7i3uZV3nlnHtAlj0i5nxOSz5X4UuDwizgPOB1ZKurhXmw8DuyPiDODvgM8Xtkwzs6H51eu7+c2ew6w+r3KGZCCPcI+MA8nN2uQnejV7L3BnMn03cIXK8bpVZlZyGppaGVNTxZXnzEm7lBGV15i7pGpJzwHbgYcj4qleTeYBrwNERCewF5hZyELNzAarqzu4d10rl589m8njatMuZ0TlFe4R0RUR5wPzgYskLe3VJNdWeu+teyTdImmtpLXt7e2Dr9bMbBCefmUX7fuPsqrChmRgkHvLRMQeYA2wstddW4EFAJJqgKnArhyPvz0iVkTEirq6uiEVbGaWr4bmFiaMqebyxbPTLmXE5bO3TJ2kacn0eOBKYFOvZvcAH0qmbwQejYiTttzNzEZKR1c3D6xv44pz5jBhTE3a5Yy4fHo8F7hTUjWZD4PvR0SjpNuAtRFxD3AH8G1Jm8lssd9UtIrNzPLw85d2suvgMVZX0IFL2QYM94hoBi7IMf+zWdNHgN8ubGlmZkPX2NTC5LE1XHp2ZQ4B+whVMys7Rzu7eHBDG+9ZMoexNdVpl5MKh7uZlZ2fvbiDfUc6K+b0vrk43M2s7DQ2tzBtQi3vOGNW2qWkxuFuZmXlSEcXDz+/jWuW1lNbXbkRV7k9N7Oy9JNN2zl4rKsiLoLdH4e7mZWVhuYWZk0ay8WnV/YZUBzuZlY2Dhzt5NFN27l2WT3VVZV97kKHu5mVjR9v3MaRju6K3kumh8PdzMpGQ1Mr9VPGceHC6WmXkjqHu5mVhb2HO3jsxe2sWj6XqgofkgGHu5mViYc2tNHRFazykAzgcDezMtHQ3MqCGeM5
7UtEsZFRzuZlbydh08xhObd7Bq+Sn4Cp8ZDnczK3n3r2+lqztYXeEHLmVzuJtZyWtsauX0uomcM3dy2qWMGg53Mytp2/cd4clXdrLaQzJv4HA3s5J237pWImB1BV4Euz8OdzMraY3NrSyun8wZsz0kk83hbmYlq2XPYda+utunG8jB4W5mJeve5lYAVlXoRbD743A3s5LV0NzC8vlTOXXmxLRLGXUGDHdJCyT9RNJGSRskfTJHm8sk7ZX0XPLz2eKUa2aW8erOgzRv3eut9j7U5NGmE/jDiHhW0mTgGUkPR8Tzvdr9LCJWFb5EM7OTNSZDMtf5wKWcBtxyj4jWiHg2md4PbATmFbswM7P+NDS1cOGp05k3bXzapYxKgxpzl7QIuAB4Ksfdl0hqknS/pCUFqM3MLKfN2/ezqW0/qz0k06d8hmUAkDQJ+AHwqYjY1+vuZ4FTI+KApGuBfwfOzLGMW4BbABYuXDjkos2ssjU0tSLBtcsc7n3Ja8tdUi2ZYP9ORPyw9/0RsS8iDiTT9wG1kmblaHd7RKyIiBV1dXXDLN3MKlFE0NDcwsWnzWT2lHFplzNq5bO3jIA7gI0R8bd9tKlP2iHpomS5OwtZqJkZwMbW
zcfpBVPt1Av/IZlnk78EFgnaTnknl/AiwEiIivAjcCH5PUCRwGboqIKEK9ZlbhGppbqK4S1yx1uPdnwHCPiMeBfk+1FhFfBr5cqKLMzHKJCBqbW3j7GbOYMXFM2uWMaj5C1cxKRtPWvby+67D3ksmDw93MSkZjUwtjqqu4akl92qWMeg53MysJ3d1BY3Mr7zqrjqnja9MuZ9RzuJtZSXjmtd207Tvii3LkyeFuZiWhoamFcbVVXHnOnLRLKQkOdzMb9bq6g/vWtXH54tlMHJv3gfUVzeFuZqPeUy/vZMeBo6z2GSDz5nA3s1GvobmFiWOqeffi2WmXUjIc7mY2qnV0dXP/+jauPHcO42qr0y6nZDjczWxUe3zzDvYc6vCQzCA53M1sVGtsamXyuBreedZJJ5q1fjjczWzUOtLRxUMb2li5pJ6xNR6SGQyHu5mNWj99sZ39RztZdZ6HZAbL4W5mo1ZDcyszJo7hbW+amXYpJcfhbmaj0qFjnTzy/DZWLq2nttpRNVh+xsxsVHp003YOd3R5L5khcrib2ajU2NRK3eSxXHTajLRLKUkOdzMbdfYf6eDRF7Zz3bK5VFf1eyE464PD3cxGnUc2buNYZ7dP7zsMDnczG3UamlqZN208FyyYnnYpJWvAcJe0QNJPJG2UtEHSJ3O0kaQvSdosqVnSm4tTrpmVuz2HjvGzX7dz3fK5VHlIZsjyOTFyJ/CHEfGspMnAM5Iejojns9pcA5yZ/LwV+MfkfzOzQXlwQxsdXeG9ZIZpwC33iGiNiGeT6f3ARmBer2bvBb4VGU8C0yR5sMzMBq2xuZVTZ05g6bwpaZdS0gY15i5pEXAB8FSvu+YBr2fd3srJHwBmZv3aceAoT2zewerlpyB5SGY48g53SZOAHwCfioh9ve/O8ZDIsYxbJK2VtLa9vX1wlZpZ2bt/fRvdAau8l8yw5RXukmrJBPt3IuKHOZpsBRZk3Z4PtPRuFBG3R8SKiFhRV1c3lHrNrIw1NrVwxuxJnD1nctqllLx89pYRcAewMSL+to9m9wC/m+w1czGwNyJaC1inmZW5bfuO8PSWXR6SKZB89pZ5O/BBYJ2k55J5fwIsBIiIrwL3AdcCm4FDwO8VvlQzK2f3NrcSHpIpmAHDPSIeJ/eYenabAD5eqKLMrPI0NLdw7twpvKluUtqllAUfoWpmqXt91yF+9doeb7UXkMPdzFJ377rMV3Q+cKlwHO5ml
G5hbOWzCNBTMmpF1K2XC4m1mqXtlxkPW/2cfq5R6SKSSHu5mlqrEpc0jMdQ73gnK4m1mqGppbuGjRDOZOHZ92KWXF4W5mqXmhbT8vbjvgvWSKwOFuZqlpbG6hSnDNUod7oeVzhKqZWcFs23eEhza08cCGNp58eRdvP2MWdZPHpl1W2XG4m1nRv
zIA9uaOOB9W08+9oeAE6vm8hHLz2dD12yKN3iypTD3cwKLiJ4cdsBHlif2ULf2Jo5S/jSeVP49FVnsXJpPWfM9pkfi8nhbmYFERE0bd3LA+vbeHBDG6/sOIgEK06dzmeuO4erl9T7IKUR5HA3syHr6g5+uWXX8UBv3XuEmipxyZtm8pF3nsZ7zp3D7Mnj0i6zIjnczWxQjnZ28fOXdvLg+jYefn4bOw8eY2xNFe86q45PX3U2V54zh6kTatMus+I53M1sQIeOdfLYC+08sKGNRzduZ
RTiaNreHyxbNZubSeS8+qY+JYx8lo4lfDzHLae6iDH2/axgPr23jsxXaOdnYzfUIt1y6by8ql9bztjJmMralOu0zrg8PdzI7bvv8IDz+fCfRfvLSTzu6gfso43n/RQq5eUs9bFk2nptrHPpYCh7tZhdu6+9DxL0TXvrqbCFg0cwIffudpXLN0LsvnTaWqytc0LTUOd7MKtHn7geMHFa37zV4AFtdP5pNXnMnKpfWcPWeyL1Jd4hzuZhUgItjQsu/4QUWbtx8A4IKF07j1msVcvaSeRbMmplylFdKA4S7pG8AqYHtELM1x/2XAj4BXklk/jIjbClmkmQ1eV3fw7Gu7M4G+vo3f7DlMdZV462kz+N1LTuWqc+upn+p90MtVPlvu3wS+DHyrnzY/i4hVBanIzIaso6ubX7y0kwc2tPHQhm3sOHCUMdVVvOPMWXzyyjO58pw5zJg4Ju0ybQQMGO4R8VNJi4pfipkNxZGOLn76YmYf9Eee38a+I51MGFPNu8+ezdVL63n32XVMHueDiipNocbcL5HUBLQAn46IDQVarpnlsP9IB49u2s6DG9r4yaZ2Dnd0MXV8Le85t56VS+t555mzGFfrfdArWSHC/Vng1Ig4IOla4N+BM3M1lHQLcAvAwoULC7Bqs8qx88BRHtmY2Qf9ic07OdbVTd3ksdxw4TxWLpnLW0+fQa33QbfEsMM9IvZlTd8n6SuSZkXEjhxtbwduB1i87Px4/NcnNSmawe7VNeidwIaw15gG+SDpxGokHe+TyO5f7/l6Q3mZZej4NH3MP7GMwa8n12Oz1zdQWyX/VEmMqaliTHUVtdVVVFfgvtatew/zYLKHy9Ov7KI7YP708Xzobaeycmk9FyyY7n3QLadhh7ukemBbRISki8hcum/nQI97ZcdBPnDHU8NdvVWQKkFtdRL2NVXUVmfC
i86sy82uqqN87vaXu8TRW1NZnbJ+6vYkzy2Npk3pis5ff8jEkem73OMTVZ662uGnbYvrLj4PFdFptez1zY4szZk/j4u8/g6iX1LDllivdBtwHlsyvkd4HLgFmStgL/A6gFiIivAjcCH5PUCRwGboqIGGi5p9dN5FsfvWQYpedv4Gp6tx/cAwa5+GQdg2xPHF9RZD0+iKzpE7VHVmGRTETQZ9sT9fReXv
oa82J8olIshafB/LO3l+V3fQ0dVNR1fP/90c6+qmozM41tVFR2ecmJfV7mhnNweOdmbm5WhzrDNz+1hn90BP+5BUV+kNYX/iA+KNHzw9t8dmfYC8uG0/m9r2A7B8/lT+29Vnc/WSes6YPakotVr50mCDrFBWrFgRa9euTWXdZpD50Ml8gERW+GdCP/N
w+VrqCj88QHRaZdHx88yYdIR/aHSlf3Gx6fafvGNrMnj+XqJfVctWQO86f7whZ2MknPRMSKgdr5CFWrWJKoqRY11TAe71li5cVfrZuZlSGHu5lZGXK4m5mVIYe7mVkZcribmZUhh7uZWRlyuJuZlSGHu5lZGUrtCFVJ7cCrwFRgb9Zd/d3umZ4FFOqsY73XN9R2fd2fa34+fex9X6X0OXu6UH3Ot7/5tHWf+54/lN9lKJ0+D/Y17n27UH0+NSLqBmwVEan+ALfne7tnGlh
PUPtV1f9+ean08fK7XPvaYL0ud8++s+D6/PQ/ldLqU+D/Y1Hok+9/czGoZlGgZxu/d9xVj/UNv1dX+u+YPpY6X1Oc3+5tPWfe57fqn8LufTNp/XM9e8ke5zn1IblhkOSWsjjxPnlBP3uTK4z5VhJPo8G
ch+L2tAtIgftcGdznylD0PpfklruZmfWvVLfczcysHw53M7My5HA3MytDZRfukn5L0tck/UjSVWnXMxIknS7pDkl3p11LMUmaKOnO5PX9j2nXMxIq5bXNVmm/w5LOkfRVSXdL+ljBFlzsHekHeRDCN4DtwPpe81cCLwCbgT/Oc1nTgTvS7tMI9/nutPtTzP4DHwRWJ9N3pV37SL7mpfjaFqDPJfE7XMD+VhWyv6k/Cb069y7gzdlPCFANvAScDowBmoBzgWVAY6+f2VmP+xvgzWn3aYT7XHIBMMj+3wqcn7T517RrH4k+l/JrW4A+l8TvcCH6C1wP/Bz4D4WqYVRdIDsifippUa/ZFwGbI+JlAEnfA94bEX8JrOq9DEkC/gq4PyKeLW7Fw1eIPpeywfQf2ArMB56jhIcUB9nn50e2uuIYTJ8lbaSEfodzGexrHBH3APdIuhf410LUUAq/IPOA17Nub03m9eUPgCuBGyV9tJiFFdGg+ixppqSvAhdIurXYxY2Avv
Q+AGSf/ICB/KPQJy9rkMX9tsfb3O5fA7nEtf
Flkr4k6Z+A+wq1slG15d4H5ZjX55FXEfEl4EvFK2dEDLbPO4Fy+iXI2f+IOAj83kgXM0L66nO5v
Z+upzOfwO59JXf9cAawq9slLYct8KLMi6PR9oSamWkVKJfc5Wif13n8u/zyPa31II918CZ0o6TdIY4CbgnpRrKrZK7HO2Suy/+1z+fR7Z/qb9rXKvb5i/C7QCHWQ+5T6czL8WeJHMN81/mnad7rP77z67z6O9vz5xmJlZGSqFYRkzMxskh7uZWRlyuJuZlSGHu5lZGXK4m5mVIYe7mVkZcrgXgaSQ9O2s2zWS2iU1DvC48yVd28/9KyQN67BsSXWSnpL0K0nvHM6yCk3SbZKuLNCytkiaVYhlFXKZkhZLei55/t800PKT9r+QdFTSp4ez7qEaSr8lfV3SuUNY182SThnucqw0zi1Tig4CSyWNj4jDwHuA3+TxuPOBFeQ4eZCkmohYC6wdZm1XAJsi4kP5PkBSdUR0DXO9PcuqiYjOXPdFxGcLsY5R7reAH0XE/8iz/S7gE8nj8lLI12sokvV/ZIgPvxlYT3JY/jCWU/G85V489wPXJdPvJ3PEGnD8ikLfkPTLZAvuvcnhyLcB70u27N4n6c8l3S7pIeBbydnjGpNlTJL0z5LWSWqWdIOkaknflLQ+mf9fswuSdD7wBeDaZB3jJb0/abte0uez2h5ItqSfAi7Jmn+OpKezbi+S1JxMfzbp0/qkbiXz10j6C0mPAX8q6RVJtcl9U5Itw9qk9huT+VskfU7Ss0l9i5P5dZIeTu
k6RXB9qqlPQBSU8nff6n5Hn6mKQvZLW5WdLf99W+1/ImSrpXUlPS1/flWOf5kp5MXpt/kzQ9+avsU8BHJP2kv5p7RMT2iPglmSMd++vjG14vSRdKekzSM5IelDQ3afeWpKZfSPprSeuz+v/lrOU1Srosx3r+PVnmBkm39LP+Ncr8pXl98jw+J+kFSa8k7U96rySv/QrgO1nvzzWSViSP6e+9+r+T1+NJSXPyeW7LXtqH6ZbjD3AAWA7cDYwjc/7xy4DG5P6/AD6QTE8jczjyRDJbLV/OWs6fA88A45Pb2cv4PPDFrLbTgQuBh7PmTctR2/F1AKcArwF1ZP6KexT4reS+AH6nj/49B5yeTP934DPJ9IysNt/mxFWT1gBfy
vn7PWcwvwN8n0N4Ebk+ktwB8k0/8F+Hoy/WXg1mR6ZVLnrBw1bgFmAeeQOT1wbTL/K8DvJn3enNX+fuAdfbXvtcwbgK9lPXZqjvU3A5cm07f1vFbJa
pPp7XLbn6MtDjer9eQC2ZCz/UJbffB3wjmV4PvC2Z/iuSi0lw8nuvEbisd109rzEwPlnWzFzvl+Q1X9Grxu8DH8/jvbKi93IY+L3a8/gvkLwfK/3HW+5FEhHNwCIyW+29h1muAv5Y0nNk3rzjgIV9LOqeyAzt9HYl8A9Z69sNvAycLunvJa0E9g1Q5luANRHRHpmhku+QuYIMQBfwgz4e933gd5Lp9wF3JdPvVmY8fx1wObAk6zF3ZU1/nROn7v09MmGfyw+T/58h81yhsKFZAAAD3UlEQVRCJoC/BxARDwC7++pc4goyH3q/TJ7vK8h8MLUDL0u6WNJM4Gzgib7a91rmOuBKSZ+X9M6I2Jt9p6SpZD5YH0tm3cmJ57VYsl+vs4GlwMNJHz4DzJc0DZgcET9P2g3lohCfkNQEPEnmDIdn5lj/SST9EXA4Inres/29V3Lp7716jMyHEbzxvVLRPOZeXPcA/4fMFvfMrPkCboiIF7IbS3prjmUc7GPZotc53iNit6TzgKuBj5MJ4P/UT325zi/d40j0PW57F/D/JP0ws9r4taRxZLZyV0TE65L+nMyH1kn9iIgnlBnOuRSojoj1faznaPJ/Fyfeq/3VnIuAOyMi14Uu7iLzHG0C/i0iQlJ/7Xvqf1HShWROAvWXkh6KiNsGWVehZb9eAjZExCXZDSRN7+fxnbxxmHZc7wbJMM2VwCURcUjSmqx2fb5fJF0B/DZJGOfxXsm5mH7u64hks503vlcqmrfci+sbwG0Rsa7X/AeBP0iCBEkXJPP3A5PzXPZDwO/33EjGdGcBVRHxA+DPyFzDsT9PAZdKmpWMK78feGyAxxARL5H5JfozTmyR9/xy7pA0CbhxgMV8i8z3EH1ttfflcZK/GiRdRWY4qj8/JnNFn9nJY2ZIOjW574dkvqh8Pyf60V97knmnAIci4l/IfHi/4XlOtuR368TeSB8kj+e1gF4A6iRdktRbK2lJ8tfdfkkXJ+1uynrMFuB8SVWSFpC5JFxvU4HdSbAvBi7O0eYNkufuK2SGbHr+Au3vvdLX78CQ3quVzJ9wRRQRW4H/m+Ou/wl8EWhOAn4LmWuj/oQTwzV/OcDi/xfwD8kXYl3A58icRvSfJfV8aPd7WbaIaFXm0m0/IbNldF9E/CifvpEJw78GTkuWtUfS18gMWWwhc+7q/nwn6cN3B2jX2+eA7yrzJeZjZE6rur+vxhHxvKTPAA8lz0sHmb9qXk3+0nmezEWKnx6ofdZilwF/Lak7uf9jOVb9IeCrkiaQGS7L9wpSzclyITP89QUye0hNAbolfSqpt88ht4g4lnw5+aVkiKiGzPttA/Bh4GuSDpIZEuwZUnoCeIXM67ceyHXt0geAjyrzBfoLZIZmBnIzmb9a/y3ZlmmJiGv7ea98k8zzdpisL/KH+V6tSD7lr6UiCZ/3RsQHB/m4sUBXRHQmW6
GBHnF6XIMiRpUkQcSK
GJgbEZ9MuSwrAm+524hTZpfDa8iMWQ/WQuD7yVb1MeA/F7K2CnBdsgVcQ+avkZvTLceKxVvuZmZlyF+ompmVIYe7mVkZcribmZUhh7uZWRlyuJuZlSGHu5lZGf
6mnuTs/5iKkAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L1 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using L1 regularization can encourage sparse weight vectors. Does this hold true in this case? We can find out by examining the number of entries in the weight vector that are zero, with increasing levels of regularization:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"L1 (1.0) number of zero weights: 4\n",
"L1 (10.0) number of zeros weights: 33\n",
"L1 (100.0) number of zeros weights: 58\n"
]
}
],
"source": [
"model_l1 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=1.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_10 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=10.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_100 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=100.0, regType='l1', intercept=False)\n",
"\n",
"print (\"L1 (1.0) number of zero weights: \" + str(sum(model_l1.weights.a
ay == 0)))\n",
"\n",
"print (\"L1 (10.0) number of zeros weights: \" + str(sum(model_l1_10.weights.a
ay == 0)))\n",
"\n",
"print (\"L1 (100.0) number of zeros weights: \" + str(sum(model_l1_100.weights.a
ay == 0)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intercept"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final parameter option for the linear model is whether to use an intercept or not. An intercept is a constant term that is added to the weight vector and effectively accounts for the mean value of the target variable. If the data is already centered or normalized, an intercept is not necessary; however, it often does not hurt to use one in any case."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, True]\n",
"[1.414347928269498, 1.4431958566566532]\n"
]
}
],
"source": [
"params = [False, True]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, 1.0, 'l2', param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAF5NJREFUeJzt3X+8HXV95/HX24Qf
9Ac7UUiIkWtdEi2lvWaqtYaRvoLrGVKqxWcNHUrVh31T6Kq0sRu60/tqurYjV1EbUtiNa2WYwLiiBURQkCgUCjMVDJQktE5LHqiqKf/WPmwuFwbs7c3JN7yfB6Ph7ncefHd2Y+Z+657ztn5sz3pKqQJPXLgxa7AEnS5BnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPLV2sDS9btqxWrFixWJuXpD3SFVdc8a2qmhrXbtHCfcWKFWzcuHGxNi9Je6Qk/9SlnadlJKmHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcW7Q5Vqe8+ueWWxS5B91O/9cQDdvs2PHKXpB4aG+5Jzkxya5Jrx7T7hSQ/TnLs5MqTJO2KLkfuZwGrd9YgyRLgbcD5E6hJkjRPY8+5V9UlSVaMafZq4G+AX5hATWN5LlM7sxDnM6X7u3mfc09yIPCbwPvnX44kaRImcUH1XcAfVtWPxzVMsjbJxiQbd+zYMYFNS5JGmcRHIaeBc5IALAOOTnJXVf3dcMOqWgesA5ienq4JbFuSNMK8w72qVs4MJzkLOG9UsEuSFs7YcE9yNnAEsCzJduCPgL0Aqsrz7JJ0P9Tl0zLHd11ZVZ04r2okSRPhHaqS1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg+NDfckZya5Ncm1s8x/cZJN7eOLSZ46+TIlSXPR5cj9LGD1TubfADynqg4F3gKsm0BdkqR5WDquQVVdkmTFTuZ/cWD0MuCg+ZclSZqPSZ9zPwn49ITXKUmao7FH7l0leS5NuP/STtqsBdYCLF++fFKbliQNmciRe5JDgQ8Ca6rqttnaVdW6qpququmpqalJbFqSNMK8wz3JcuCTwO9U1dfmX5Ikab7GnpZJcjZwBLAsyXbgj4C9AKrq/cCpwKOA9yUBuKuqpndXwZKk8bp8Wub4MfNfDrx8YhVJkubNO1QlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4aG+5Jzkxya5JrZ5mfJO9OsjXJpiRPn3yZkqS56HLkfhaweifzjwIOaR9rgT+ff1mSpPkYG+5VdQnw7Z00WQN8pBqXAfslOWBSBUqS5m4S59wPBG4aGN/eTruPJGuTbEyycceOHRPYtCRplEmEe0ZMq1ENq2pdVU1X1fTU1NQENi1JGmUS4b4dOHhg/CDg5gmsV5K0iyYR7uuBl7afmnkGcEdV3TKB9UqSdtHScQ2SnA0cASxLsh34I2AvgKp6P7ABOBrYCnwfeNnuKlaS1M3YcK+q48fML+BVE6tIkjRv3qEqST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ53CPcnqJFuSbE1yyoj5y5NclOTKJJuSHD35UiVJXY0N9yRLgDOAo4BVwPFJVg01exNwblU9DTgOeN+kC5UkddflyP1wYGtVbauqHwLnAGuG2hTw8Hb4EcDNkytRkjRXSzu0ORC4aWB8O/Cvh9qcBlyQ5NXAQ4AjJ1KdJGmXdDlyz4hpNTR+PHBWVR0EHA18NMl91p1kbZKNSTbu2LFj7tVKkjrpEu7bgYMHxg/ivqddTgLOBaiqLwH7AsuGV1RV66pquqqmp6amdq1iSdJYXcL9cuCQJCuT7E1zwXT9UJtvAs8DSPKzNOHuobkkLZKx4V5VdwEnA+cD19N8KmZzktOTHNM2ex3wiiRXA2cDJ1bV8KkbSdIC6XJBlaraAGwYmnbqwPB1wLMmW5okaVd5h6ok9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOdwj3J6iRbkmxNcsosbV6Y5Lokm5P89WTLlCTNxdJxDZIsAc4AfhXYDlyeZH1VXTfQ5hDgDcCzqur2JI/eXQVLks
cuR+OLC1qrZV1Q+Bc4A1Q21eAZxRVbcDVNWtky1TkjQXXcL9QOCmgfHt7bRBTwCekOQLSS5LsnpSBUqS5m7saRkgI6bViPUcAhwBHARcmuQpVfWde60oWQusBVi+fPmci5UkddPlyH07cPDA+EHAzSPa/H1V/aiqbgC20IT9vVTVuqqarqrpqampXa1ZkjRGl3C/HDgkycokewPHAeuH2vwd8FyAJMtoTtNsm2ShkqTuxoZ7Vd0FnAycD1wPnFtVm5OcnuSYttn5wG1JrgMuAv6gqm7bXUVLknauyzl3qmoDsGFo2qkDwwW8tn1IkhaZd6hKUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST3UKdyTrE6yJcnWJKfspN2xSSrJ9ORKlCTN1dhwT7IEOAM4ClgFHJ9k1Yh2DwN+H/jypIuUJM1NlyP3w4GtVbWtqn4InAOsGdHuLcDbgR9MsD5J0i7oEu4HAjcNjG9vp90tydOAg6vqvJ2tKMnaJBuTbNyxY8eci5UkddMl3DNiWt09M3kQ8E7gdeNWVFXrqmq6qqanpqa6VylJmpMu4b4dOHhg/CDg5oHxhwFPAS5OciPwDGC9F1UlafF0CffLgUOSrEyyN3AcsH5mZlXdUVXLqmpFVa0ALgOOqaqNu6ViSdJYY8O9qu4CTgbOB64Hzq2qzUlOT3LM7i5QkjR3S7s0qqoNwIahaafO0vaI+ZclSZoP71CVpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqoU7hnmR1ki1JtiY5ZcT81ya5LsmmJBcmeezkS5UkdTU23JMsAc4AjgJWAccnWTXU7EpguqoOBT4BvH3ShUqSuuty5H44sLWqtlXVD4FzgDWDDarqoqr6fjt6GXDQZMuUJM1Fl3A/ELhpYHx7O202JwGfnk9RkqT5WdqhTUZMq5ENk5cA08BzZpm/FlgLsHz58o4lSpLmqsuR+3bg4IHxg4CbhxslORJ4I3BMVd05akVVta6qpqtqempqalfqlSR10CXcLwcOSbIyyd7AccD6wQZJngZ8gCbYb518mZKkuRgb7lV1F3AycD5wPXBuVW1OcnqSY9pm7wAeCnw8yVVJ1s+yOknSAuhyzp2q2gBsGJp26sDwkROuS5I0D96hKkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1UKdwT7I6yZYkW5OcMmL+Pkk+1s7/cpIVky5UktTd2HBPsgQ4AzgKWAUcn2TVULOTgNur6meAdwJvm3ShkqTuuhy5Hw5sraptVfVD4BxgzVCbNcCH2+FPAM9LksmVKUmaiy7hfiBw08D49nbayDZVdRdwB/CoSRQoSZq7pR3ajDoCr11oQ5K1wNp29LtJtnTY/mJaBnxrsYvowDonb0+p1Tona0+o87FdGnUJ9+3AwQPjBwE3z9Jme5KlwCOAbw+vqKrWAeu6FHZ/kGRjVU0vdh3jWOfk7Sm1Wudk7Sl1dtHltMzlwCFJVibZGzgOWD/UZj1wQjt8LPC5q
PkbskaWGMPXKvqruSnAycDywBzqyqzUlOBzZW1XrgfwIfTbKV5oj9uN1ZtCRp57qclqGqNgAbhqadOjD8A+C3J1va/cKecgrJOidvT6nVOidrT6lzrHj2RJL6x+4HJKmHHvDhnuSRST6T5Ovtz/1HtDksyZeSbE6yKcmLBuadleSGJFe1j8MmXN8ud/2Q5A3t9C1Jfn2Sde1Cna9Ncl27/y5M8tiBeT8e2H/DF+sXus4Tk+wYqOflA/NOaF8nX09ywvCyC1znOwdq/FqS7wzMW8j9eWaSW5NcO8v8JHl3+zw2JXn6wLyF3J/j6nxxW9+mJF9M8tSBeTcmuabdnxt3Z50TVVUP6AfwduCUdvgU4G0j2jwBOKQd/mngFmC/dvws4NjdVNsS4BvA44C9gauBVUNtfg94fzt8HPCxdnhV234fYGW7niWLWOdzgX/VDv+HmT
8e8u0O+6S50nAu8dsewjgW3tz/3b4f0Xq86h9q+m+aDDgu7PdlvPBp4OXDvL/KOBT9PcC/MM4MsLvT871vnMme3TdLXy5YF5NwLLFmqfTurxgD9y595dJ3wYeP5wg6r6WlV9vR2+GbgVmFqA2ubT9cMa4JyqurOqbgC2tutblDqr6qKq+n47ehnN/RILrcv+nM2vA5+pqm9X1e3AZ4DV95M6jwfO3k217FRVXcKIe1oGrAE+Uo3LgP2SHMDC7s+xdVbVF9s6YPFenxNluMNjquoWgPbno3fWOMnhNEdT3xiY/F
t3PvTLLPBGubT9cPXZZdyDoHnURzNDdj3yQbk1yW5D7/XCeoa50vaH+fn0gycwPf/XJ/tqe3VgKfG5i8UPuzi9mey0Luz7kafn0WcEGSK9q77PcInT4KuadL8lngp0bMeuMc13MA8FHghKr6STv5DcA/0wT+OuAPgdN3vdp7b3LEtK5dP3TqEmJCOm8ryUuAaeA5A5OXV9XNSR4HfC7JNVX1jVHLL0Cd/ws4u6ruTPJKmndFv9Jx2UmZy7aOAz5RVT8emLZQ+7OL+8Prs7Mkz6UJ918amPysdn8+GvhMkn9s3wncrz0gjtyr6siqesqIx98D/9KG9kx43zpqHUkeDnwKeFP79nJm3be0bznvBD7EZE99zKXrB3Lvrh+6LLuQdZLkSJp/qMe0+wu4+1QXVbUNuBh42mLVWVW3DdT2F8DPd112IesccBxDp2QWcH92MdtzWcj92UmSQ4EPAmuq6raZ6QP781bgb9l9pzcna7FP+i/2A3gH976g+vYRbfYGLgT+44h5B7Q/A7wLeOsEa1tKc6FpJfdcWHvyUJtXce8Lque2w0/m3hdUt7H7Lqh2qfNpNKeyDhmavj+wTzu8DPg6O7l4uAB1HjAw/JvAZe3wI4Eb2nr3b4cfuVh1tu2eSHOxL4uxPwe2uYLZL1T+Bve+oPqVhd6fHetcTnNd6plD0x8CPGxg+IvA6t1Z58Se72IXsNgPmvPTF7Z/BBfOvMBoTh18sB1+CfAj4KqBx2HtvM8B1wDXAn8JPHTC9R0NfK0Nxje2006nOfoF2Bf4ePvC/ArwuIFl39gutwU4ajfvx3F1fhb4l4H9t76d/sx2/13d/jxpkev8U2BzW89FwJMGlv337X7eCrxsMetsx09j6GBiEfbn2TSfHvsRzdH4ScArgVe280PzZT/faOuZXqT9Oa7ODwK3D7w+N7bTH9fuy6vb18Ubd2edk3x4h6ok9dAD4py7JD3QGO6S1EOGuyT1kOEuST1kuEtSDxnue4AkleSjA+NL254Lzxuz3GFJjt7J/Okk755krSO2ccxMr4ZJnp9k1cC8i5NM5Psqk/znSaxnlnXfmGTZLiz3wZnnO1hfkhWz9U44SUk2JNlvTJsTk/z07q7l
9PjPc9wzfA56S5MHt+K8C/6fDcofRfF76PpIsraqNVfX7E6pxpKpaX1VvbUefT9Nb5e6w28J9V1XVy6vqunZ0weurqqOr6jtjmp1I09NpZ+2d0JMy5+2rG8N9z/Fpmrv9YKgXwCQPafurvjzJlUnWpPky89OBF7X9UL8oyWlJ1iW5APhIkiNmjv6TPDTJh9p+qzcleUGSJWn6q7+2nf6fBgtq529r++zeL8lPkjy7nXdpkp9pj8zem+SZwDHAO9p6Ht+u5reTfCVNn+S/3C6770AtV7b9fcwc5b13YPvntc/hrcCD2/X+1fCOS/LnbUdam5O8eWD6jUnenOSr7bae1E5/VJIL2m1/gBH9oCR5YZL/3g6/Jsm2dvjxSf6hHb64fXc0qr4lSf6iremCgX/cg9v4t2n66L8yyWeTPKadflr7+7643f8j/0HPvONo3ylcP7y9JMfS3Kz3V21tD07y80k+n6aTrPNzT9ccFyf5kySfB16T5DFJ/jbJ1e3jmW27l7S/z6uSfCDJknb6d5P8WbuvL0wyNWr7o56HdtFi30XlY/wD+C5wKE2XvvvS3EF3BHBeO/9PgJe0w/vR3Nn4EIb6Jqe5o/EK4MHt+OA63ga8a6Dt/jT9qnxmYNp+I2r73zRdHfwb4HKau2L3AW5o599dA0N939P0e/Jn7fDRwGfb4dcBH2qHnwR8s33ew8/nPOCImX20k/03c9fxknabh7bjNwKvbod/j3vuSH43cGo7/Bs0HVotG1rnTwGXt8OfaJ/7gcAJwJ8OPL/p4fpoboO/i3vucj535vc3tI39ueerMF8+sK9Oo7kNfh+aLgZuA/YasfyN7fxZtzdU417teqfa8RfR9hPftnvfwLo/RtsdR7tfHwH8LE3Ha3u1098HvLQdLuDF7fCpA6+Ju7fvY7KPB0SvkH1QVZvSfMvS8Qx9WTnwa8AxSV7fju9L01fGKOur6v+NmH4kTd80M9u7vT0afVyS99B0mnbBiOUupfkihJU0t+6/Avg8Tdh18cn25xU0IQRNj3zvaev4xyT/RPOFKbvqhWm6al0KHEBzamjTiO3/Vjv87JnhqvpUktsZUlX/3L7beRhNB1h/3S73ywPr3JkbquqqgW2vGNHmIOBj7dHz3jT9r8z4VDUdnN2Z5FbgMTS31c9ne08EnkLT8yE0oX3LwPyPDQz/CvBSgGp6pLwjye/QHBBc3i7/YO7piO8nA8v/Jd32kebB0zJ7lvXAf+O+X8wQ4AVVdVj7WF5V18+yju/NMj0MdblazZcXPJXm6OpVNP1vDLuUJtAOp/mnsx/NO4KuXaLO9MD4Y+7pgnpUd7DQHH0Ovmb3HbfyJCuB1wPPq6pDaf5JDS43avvQrfvZLwEvo+m7Z2Y
CLwhQ7L3jkwPLztGe+hOcL9OeB3Z6l7Z8vPdXsBNg+8jn6uqn5tYP5sr53B5T88sPwTq+q0Wdra78luZrjvWc4ETq+qa4amnw+8Ou3hUpKZLl7/L/Cwjuu+ADh5ZiTJ/mk+IfKgqvob4L/QfE3ZsC/TdFb1k6r6Ac0po9+lCbthXeu5BHhxW8cTaN6FbKE5zXBYkgel+RKNwa5Xf5RkrxHrejhNKN3RnrM+ao7bP4rm9Mhs7V7f
yS5qsE76yqO0a0na2+nXkE91w4P2GOy3Y1+DvZAkwl+UWAJHslefIsy11I83WJM9deHt5OOzZNv+cz30/82Lb9g4Bj2+F/B/zDiO1rggz3PUhVba+q/zFi1ltozpduSvMRu7e00y8CVrUXq140YrlBfwzsn+bi6dU0QXUgcHGSq2jOl79hRE130nyjzkwf95fS/LEO/wOC5uvi/qC9QPj4EfNnvI/mguM1NG/lT2y38wWaUxPX0LyD+erAMuva53+vC6pVdTVN8G6m+efY5aj6zcCzk3yV5pTXN2dpdynNKZlL2lMTN3FPaA0bWd8YpwEfT3Ip8K05LDcXZwHvb3/HS2gC+G3ta+Aqmn/co7wGeG77O7qCpkvi64A30Xxr0Saar847oG3/PeDJSa6gOaUz84U2d2/fC6qTZa+Qkna7JN+tqocudh0PJB65S1IPeeQuST3kkbsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPfT/AR/BbR3UMogHAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bar(params, metrics, color='lightblue')\n",
"pyplot.xlabel('Metrics without and with an intercept')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have seen, decision tree models typically work on raw features (that is, it is not required to convert categorical features into a binary vector encoding; they can, instead, be used directly). Therefore, we will create a separate function to extract the decision tree feature vector, which simply converts all the values to floats and wraps them in a numpy a
ay:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(record):\n",
" return np.a
ay(list(map(float, record[2:14])))"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"def extract_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"data_dt = data_rec.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0]\n",
"Decision Tree feature vector length: 12\n"
]
}
],
"source": [
"first_point_dt = data_dt.first()\n",
"print (\"Decision Tree feature vector: \" + str(first_point_dt.features))\n",
"print (\"Decision Tree feature vector length: \" + str(len(first_point_dt.features)))"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import DecisionTree"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree predictions: [(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945), (13.0, 14.284023668639053), (1.0, 14.284023668639053)]\n",
"Decision Tree depth: 5\n",
"Decision Tree number of nodes: 63\n"
]
}
],
"source": [
"dt_model = DecisionTree.trainRegressor(data_dt,{})\n",
"preds = dt_model.predict(data_dt.map(lambda p: p.features))\n",
"actual = data.map(lambda p: p.label)\n",
"true_vs_predicted_dt = actual.zip(preds)\n",
"print (\"Decision Tree predictions: \" + str(true_vs_predicted_dt.take(5)))\n",
"print (\"Decision Tree depth: \" + str(dt_model.depth()))\n",
"print (\"Decision Tree number of nodes: \" + str(dt_model.numNodes()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the same approach for the decision tree model, using the true_vs_predicted_dt RDD:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared E
or: 11611.4860\n",
"log - Mean Absolue E
or: 71.1502\n",
"Root Mean Squared Log E
or: 0.6251\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Impact of training on log-transformed targets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will perform the same analysis for the decision tree model:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"data_dt_log = data_dt.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n",
"\n",
"dt_model_log = DecisionTree.trainRegressor(data_dt_log,{})\n",
"\n",
"preds_log = dt_model_log.predict(data_dt_log.map(lambda p: p.features))\n",
"\n",
"actual_log = data_dt_log.map(lambda p: p.label)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"new=actual_log.zip(preds_log)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(2.772588722239781, 3.6251613906330347),\n",
" (3.6888794541139363, 3.6251613906330347),\n",
" (3.4657359027997265, 1.985090627799027),\n",
" (2.5649493574615367, 1.985090627799027),\n",
" (0.0, 1.985090627799027)]"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.take(5)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_dt_log=[]\n",
"for val in new.collect():\n",
" t,p=val[0],val[1]\n",
" x=np.exp(t),np.exp(p)\n",
" true_vs_predicted_dt_log.append(x)"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared E
or: 14781.5760\n",
"log - Mean Absolue E
or: 76.4131\n",
"Root Mean Squared Log E
or: 0.6406\n",
"Non log-transformed predictions:\n",
"[(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945)]\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt_log:\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)\n",
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted_dt.take(3)))\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CROSS VALIDATION for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"train_dt, test_dt = data_dt.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The impact of parameter settings for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(train, test, maxDepth, maxBins):\n",
"\n",
" model = DecisionTree.trainRegressor(train, {}, impurity='variance', maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(test.map(lambda p: p.features))\n",
"\n",
" actual = test.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tree depth"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We would generally expect performance to increase with more complex trees (that is, trees of greater depth). Having a lower tree depth acts as a form of regularization, and it might be the case that as with L2 or L1 regularization in linear models, there is a tree depth that is optimal with respect to the test set performance.\n",
"\n",
"Here, we will try to increase the depths of trees to see what impact they have on test set RMSLE, keeping the number of bins at the default level of 32:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 10, 20]\n",
"[1.0009455704281573, 0.9071380409401831, 0.8083991513814845, 0.7316093046671605, 0.6252775817287765, 0.43025139584509925, 0.4467589576168234]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt8VPWd
HXJ/cQ7hAQIZCAKCIqQgC5qNS2Lri7gHepd1QEsd3t/tyt3
+XNtt163tdrUgoiLiDe9KrRWveOEmAQRBLoZ7ACFyv4eQ7/5xTnQYJslAJjkzk/fz8ZgHM+d8Z+bNmck7J2dmvmPOOUREJLmkBB1ARERiT+UuIpKEVO4iIklI5S4ikoRU7iIiSUjlLiKShFTuIiJJSOUuIpKEVO4iIkkoLag7bt26tcvPzw/q7kVEEtKCBQu+cc7l1jQusHLPz8+nqKgoqLsXEUlIZrY+mnE6LCMikoRU7iIiSUjlLiKShFTuIiJJSOUuIpKEaix3M5tsZtvMbGkV683MHjKzYjNbYma9Yh9TRERORDR77lOAIdWsHwp09U+jgUdqH0tERGqjxnJ3zn0M7KhmyHBgqvPMBZqbWbtYBQy3dNNuHnh7Bfp6QBGRqsXimHt7YGPI5RJ/WZ1YuGEnj8xczZw12+vqLkREEl4syt0iLIu4W21mo82syMyKSktLT+rOri7Mo02TTB56/6uTur6ISEMQi3IvAfJCLncANkca6Jyb5JwrdM4V5ubWODVCRFnpqYy5qAtz1+xgnvbeRUQiikW5Twdu9N81cz6w2zm3JQa3W6Uf9etI68aZPPSB9t5FRCKJ5q2QzwNzgDPMrMTMbjWzMWY2xh/yFrAGKAYeA+6ss7Q+b++9M7OKt1O0
XekVEGiYL6l0nhYWFrjazQh4oK+eCBz7krPbNmDqqbwyTiYjELzNb4Jw
Glcwn5CtVFGGrdf2JmPV5WyaMPOoOOIiMSVhC13gBvO70SLRuk8/EFx0FFEROJKQpd7TmYat13QmQ9WbGNJya6g44iIxI2ELneAG/t3oll2Og+9r713EZFKCV/uTbLSuXVQAe8t38rSTbuDjiMiEhcSvtwBbhqQT5OsNP6sY+8iIkCSlHuz7HRuGVjA28u+ZvmWPUHHEREJXFKUO8Cogfk0ztTeu4gIJFG5N2+Uwc0D8nlr6RZWbd0bdBwRkUAlTbkD3DqogOz0VO29i0iDl1Tl3iIngxv75/OXJZsp3rYv6DgiIoFJqnIHuP2CArLSUhn/ofbeRaThSrpyb9U4kxv6d+KNzzex9pv9QccREQlE0pU7wO0XdCY9NUV77yLSYCVluec2yeS6fp14bdEmNmw/EHQcEZF6l5TlDnDHRZ1JTTHtvYtIg5S05d62aRYj++TxysISNu7Q3ruINCxJW+4AYwZ3IcWMRz5aHXQUEZF6ldTl3q5ZNlf36cBLRRvZvOtg0HFEROpNUpc7wNjBpwEwUXvvItKAJH25t2+ezZW9OzDts418vftQ0HFEROpF0pc7wJ2DT6PCOe29i0iD0SDKPa9lIy7v1Z7nP9vAtj3aexeR5Ncgyh1g3PdOo7zCMenjNUFHERGpcw2m3Du1ymF4z1N5Zt56SvceDjqOiEidajDlDt7ee1l5BY9/or13EUluUZW7mQ0xs5VmVmxm90RY38nM3jezJWY208w6xD5q7XXJbcw/nnsqU+esZ+f+sqDjiIjUmRrL3cxSgfHAUKA7MNLMuocNexCY6pw7B7gf+F2sg8bKuO+dxsEjR3lqzrqgo4iI1Jlo9tz7AsXOuTXOuTJgGjA8bEx34H3
IcR1seN09s24QdntmXK7HUcKCsPOo6ISJ2IptzbAxtDLpf4y0ItBq7wz18GNDGzVuE3ZGajzazIzIpKS0tPJm9MjB3chV0HjjDts401DxYRSUDRlLtFWObCLt8NXGRmi4CLgE3AcbvFzrlJzrlC51xhbm7uCYeNld6dWtC3oCWPf7KGsvKKwHKIiNSVaMq9BMgLudwB2Bw6wDm32Tl3uXPuPOAX
LdMUtZB8YO7sLm3YeYvnhzzYNFRBJMNOU+H+hqZgVmlgFcC0wPHWBmrc2s8rZ+DkyObczYG3x6Lt1OacLEj1ZTURH+h4iISGKrsdydc+XAXcAMYDnwonNumZndb2bD/GGDgZVmtgpoC/xnHeWNGTNj7OAuFG
x3vLtwYdR0Qkpsy5YPZaCwsLXVFRUSD3Xan8aAWDH5xJbpNMXh07ALNILy+IiMQPM1vgnCusaVyD+oRquLTUFO64sDOLNuzis7U7go4jIhIzD
cAa4qzKNVToa+ik9EkkqDL/es9FRGDSpg5spSvty8J+g4IiIx0eDLHeD68zvRODNNX+YhIklD5Q40y07nun4deXPJZjZsPxB0HBGRWlO5+0YNKiAtJYVJn2jvXUQSn8rd17ZpFlf0bs+LRSX6Mg8RSXgq9xCjL+zCkaMVPDl
dBRRERqReUeoqB1Dpf2aMfTc9ez99CRoOOIiJw0lXuYMRd1Ye+hcp6btyHoKCIiJ03lHubsDs0YdFprHv90LYeOHA06jojISVG5RzB2cBdK9x7mtUWbgo4iInJSVO4RDOjSinM6NOPRj1ZzVNMBi0gCUrlHYGaMvagL67Yf4O2lXwcdR0TkhKncq3DJWafQuXUOj3xUTFDTIouInCyVexVSU4w7LurM0k17+LT4m6DjiIicEJV7NUac1562TTN5ZKamJBCRxKJyr0ZmWiq3DerM7NXbWbxxV9BxRESipnKvwch+HWmapemARSSxqNxr0DgzjZsG5PP2sq9ZXbov6DgiIlFRuUfhpgH5ZKSmMOmjNUFHERGJiso9Cq0bZ3JNnzxeXVTC17sPBR1HRKRGKvco3X5BZyocPPGp9t5FJP6p3KOU17IR/3hOO56bt4FdB8qCjiMiUi2V+wkYM7gL+8uO8vSc9UFHERGpVlTlbmZDzGylmRWb2T0R1nc0sw/NbJGZLTGzS2MfNXjdTmnKxd3a8OTsdew7XB50HBGRKtVY7maWCowHhgLdgZFm1j1s2C+BF51z5wHXAhNiHTRe/OT7Xdmxv4yH3v8q6CgiIlWKZs+9L1DsnFvjnCsDpgHDw8Y4oKl/vhmwOXYR40vPvOZc2yePyZ+uZdXWvUHHERGJKJpybw9sDLlc4i8LdR9wvZmVAG8BP45Jujj1b0O6kZOZxr1vLNWMkSISl6Ipd4uwLLzRRgJTnHMdgEuBp83suNs2s9FmVmRmRaWlpSeeNk60zMng34acwdw1O5i+OGn/SBGRBBZNuZcAeSGXO3D8YZdbgRcBnHNzgCygdfgNOecmOecKnXOFubm5J5c4TlzbpyPndGjG
66nL2HjgQdR0TkGNGU+3ygq5kVmFkG3gum08PGbAC+D2BmZ+KVe+LumkchNcX49fAefLPvMH96Ty+uikh8qbHcnXPlwF3ADGA53rtilpnZ/WY2zB/2/4DbzWwx8Dxws2sAB6PPzWvOyL4dmTJ7HSu+3hN0HBGRb1lQHVxYWOiKiooCue9Y2rm/jIv/MJOubZrwwh3nYxbpJQoRkdgwswXOucKaxukTqrXUIieDnw3pxmfrdvDaok1BxxERAVTuMXF1YR4985rz27eWs/ugXlwVkeCp3GMgJcX4zYgebN9fxv+8uyroOCIiKvdY6dG+Gdf368TUOev4crNeXBWRYKncY+juS86geaMM7n1jKRUVSf9mIRGJYyr3GGrWKJ17hnajaP1OXtWLqyISIJV7jF3ZqwO9Ojbnd28tZ/cBvbgqIsFQucdYSorx6xE92HmgjD+8uzLoOCLSQKnc68BZpzbjxv75PDN3PUs37Q46jog0QCr3OvLTH55Oy5wMfqUXV0UkACr3OtIsO52fDz2TRRt28fKCkqDjiEgDo3KvQ5f3ak+f/Bb819sr2HWgLOg4ItKAqNzrkJlx
Ae7D54hN/P0IurIlJ/VO517Mx2Tbmpfz7PfbaBJSW7go4jIg2Eyr0e/PMPu9K6cSa/el0vropI/VC514OmWen84tIzWVyymxeKNtZ8BRGRWlK515PhPU+lX0FLHnh7BTv368VVEalbKvd6YuZ9cnXvoXL+e8aKoOOISJJTudej09s2YdTAfKbN30jxtr1BxxGRJKZyr2djLupCemoKT85aF3QUEUliKvd61qpxJiN6nsqrCzdp1kgRqTMq9wDcMrCAg0eOMm3+hqCjiEiSUrkH4Mx2TTm/c0uemr2O8qMVQccRkSSkcg/IqIEFbN59iHe+3Bp0FBFJQir3gHz/zLbktczmyVlrg44iIklI5R6Q1BTjpv75zF+3ky9K9IUeIhJbUZW7mQ0xs5VmVmxm90RY/z9m9rl/WmVmmiErClf3ySMnI1V77yISczWWu5mlAuOBoUB3YKSZdQ8d45z7qXOup3OuJ/Aw8GpdhE02TbPSuaowj78s2cy2vYeCjiMiSSSaPfe+QLFzbo1zrgyYBgyvZvxI4PlYhGsIbhqQT3mF45m5elukiMRONOXeHgidyrDEX3YcM+sEFAAfVLF+tJkVmVlRaWnpiWZNSgWtc7j4jDY8N289h8uPBh1HRJJENOVuEZZVNSn5tcDLzrmILeWcm+ScK3TOFebm5kabMendMrCA
aV8ZfFW4KOIiJJIppyLwHyQi53ADZXMfZadEjmhA08rRWnt23M5E/X4py+zENEai+acp8PdDWzAjPLwCvw6eGDzOwMoAUwJ7YRk5+ZccvAAr7csofP1u4IOo6IJIEay905Vw7cBcwAlgMvOueWmdn9ZjYsZOhIYJrTrudJGdGzPc0bpWu2SBGJibRoBjnn3gLeClt2b9jl+2IXq+HJzkjlR307MvGj1WzccYC8lo2CjiQiCUyfUI0jN/TvhJkxdc66oKOISIJTuceRds2yGdrjFKbN38j+w+VBxxGRBKZyjzOjBhWw91A5rywsCTqKiCQwlXuc6dWxBefmNWfKrHVUVOi1aRE5OSr3ODRqYD5rvtnPR6v0KV4ROTkq9zg0tEc72jbNZLJmixSRk6Ryj0MZaSnccH4nPvnqG77aujfoOCKSgFTucWpk345kpKXw5Ox1QUcRkQSkco9TrRpnclnP9ry6sIRdB8qCjiMiCUblHsduGZTPoSMVTJu/sebBIiIhVO5xrNspTenfuRVTZ6+j/GhF0HFEJIGo3OPcqEEFbN59iBnLtgYdRUQSiMo9zl3crQ0dWzbSl2iLyAlRuce51BTjpgH5FK3fyZKSXUHHEZEEoXJPAFcXdqBxZprmeheRqKncE0CTrHSu7N2BN5dsZtueQ0HHEZEEoHJPEDcPyKe8wvHM3PVBRxGRBKByTxD5rXP4frc2PDtvA4eOHA06jojEOZV7ArllYAHb95fxl8Wbg44iInFO5Z5ABnRpxRltmzBZc72LSA1U7gnEzLj9ws4s37KHH09bpMMzIlKltKADyIm5old7tu87zH+9vYKSnQd57MbetGmSFXQsEYkz2nNPMGbGHRd14dHre7Pq672M+PMsvty8J+hYIhJnVO4J6pKzTuGlMf2pcHDlxNm896XmnhGR76jcE1iP9s14466BnNamMbc/XcTjn6zBOb3QKiJRlruZDTGzlWZWbGb3VDHmajP70syWmdlzsY0pVWnbNIsXRvdnaI9T+M1fl/Pvr33BEU0PLNLg1fiCqpmlAuOBHwIlwHwzm+6c+zJkTFfg58BA59xOM2tTV4HleNkZqfx5ZC/+2HoVf/6wmPXbDzDhul40b5QRdDQRCUg0e+59gWLn3BrnXBkwDRgeNuZ2YLxzbieAc25
GNKTVJSjLv/7gz+ePW5FK3byeUTZrP2m/1BxxKRgERT7u2B0O95K/GXhTodON3MZpnZXDMbEquAcmIu79WBZ2/vx66DRxgxfhZzVm8POpKIBCCacrcIy8JftUsDugKDgZHA42bW/LgbMhttZkVmVlRaWnqiWSVKffJb8vqdA2nTJJMbnpjHC/M3BB1JROpZNOVeAuSFXO4AhE9uUgK84Zw74pxbC6zEK/tjOOcmOecKnXOFubm5J5tZotCxVSNeuXMA
u04mevfMFv31rOUU1ZINJgRFPu84GuZlZgZhnAtcD0sDGvA98DMLPWeIdp1sQyqJy4plnpPHlzH27s34lJH6/hjqcXsP9wedCxRKQe1Fjuzrly4C5gBrAceNE5t8zM7jezYf6wGcB2M/sS+BD4V+ecDvbGgbTUFO4f3oP/GHYWH6zYypUT57B518GgY4lIHbOgPvRSWFjoioqKArnvhmrmym38+LlFZGWk8viNhZybd9zLIiIS58xsgXOusKZx+oRqAzL4jDa8cucAMtNSuPrROfx1yZagI4lIHVG5NzCnt23CG+MG0qN9M8Y9t5CH3/9KUxaIJCGVewPUqnEmz97Wj8vOa88f3l3Fv7y4mMPlmhteJJloPvcGKis9lT9efS5dcnN48J1VbNhxgEdv6E3rxplBRxORGNCeewNmZtx1cVcmXNeLpZt2M2L8LFZt3Rt0LBGJAZW7cOnZ7Xjxjv4cLq/gigmzmblSUwOJJDqVuwBwbl5z3hg3kLyWjRg1ZT5PzV4XdCQRqQWVu3zr1ObZvDSmPxd3a8v/n76Me99YSrnmhhdJSCp3OUZOZhqP3tCbOy7szNQ567llynz2HDoSdCwROUEqdzlOaorx80vP5IErzmbO6u1cMWE2G7YfCDqWiJwAlbtU6Zo+HXn61n5s23uYERNmMX/djqAjiUiUVO5Srf5dWvH6uIE0z07nusfm8erCkqAjiUgUVO5So4LWObx65wB6d2rBv7y4mN/PWEGF5oYXiWsqd4lK80YZTL21LyP75jH+w9WMe24hB8s0ZYFIvFK5S9TSU1P47WVn88u/P5O3l33NNZPmsHXPoaBjiUgEKnc5IWbGbRd05rEbCineto/hf57F0k27g44lImFU7nJSftC9LS+PGUCKwVUT5zBj2ddBRxKRECp3OWndT23K63cN5PRTmjDmmQU8+tFqzQ0vEidU7lI
Zpk8cLo87n07Hb87m8r+NkrSygr15QFIkHTfO5Sa1npqTx87Xl0yW3MQ+9/xfrtB5h4fW9a5GQEHU2kwdKeu8RESorxLz88nT9d05NFG3dx2YRZrC7dF3QskQZL5S4xNeK89jx/ez/2HirnsvGzmFX8TdCRRBoklbvEXO9OLXl93EBOaZbFTZM/47l5G4KOJNLgqNylTuS1bMQrYwcwqGt
v21L/j1m19yVFMWiNQblbvUmSZZ6Tx+YyE3D8jniU/XMnpqEfsOlwcdS6RBULlLnUpLTeG+YWfx6xE9mLmqlCsfmU3JTs0NL1LXoip3MxtiZivNrNjM7omw/mYzKzWzz/3T
GPKonshvM7MeWWPmzadZAR42ezaMPOoCOJJLUay93MUoHxwFCgOzDSzLpHGPqCc66nf3o8xjklCVzQNZfX7hxAo4xUrpk0l+mLNwcdSSRpRbPn3hcods6tcc6VAdOA4XUbS5LVaW2a8Pq4gfTs0JyfPL+IP723SlMWiNSBaMq9PbAx5HKJvyzcFWa2xMxeNrO8mKSTpNQyJ4Onb+vLFb068Kf3vuKfpn3OoSOaG14klqIpd4uwLHxX6y9AvnPuHOA94KmIN2Q22syKzKyotLT0xJJKUslMS+XBq87h34acwfTFmxn52FxK9x4OOpZI0oim3EuA0D3xDsAxB0udc9udc5U/mY8BvSPdkHNuknOu0DlXmJubezJ5JYmYGXcOPo2J1/di+ZY9jBg/ixVf7wk6lkhSiKbc5wNdzazAzDKAa4HpoQPMrF3IxWHA8thFlGQ3pEc7XrpjAOUVFVwxYTYfrtgWdCSRhFdjuTvnyoG7gBl4pf2ic26Zmd1vZsP8YT8xs2Vmthj4CXBzXQWW5HR2h2a8MW4QBbk53PrUfCZ/ulYvtIrUggX1A1RYWOiKiooCuW+JXwfKyvnpC58zY9lWruvXkfuGnUV6qj5rJ1LJzBY45wprGqefGokrjTLSeOS63owd3IVn523glifns/vgkaBjiSQclbvEnZQU42dDuvH7K89h3trtXD5hFuu37w86lkhCUblL3LqqMI9nbu3H9v1ljBg/i3lrtgcdSSRhqNwlrvXr3IrX7xxIi5wMrn9iHi8Vbaz5SiKicpf4l986h9fGDqRfQSv+9eUl/NffVlChueFFqqVyl4TQrFE6T97Shx/168jEj1Yz9tkFHCjT3PAiVVG5S8JIT03hP0f04N5/6M67X27lqolz+Hr3oaBjicQllbskFDNj1KACHr+pkHXf7Gf4+E/5omR30LFE4o7KXRLSxd3a8sqdA0hLSeGqR2fz9tItQUcSiSsqd0lY3U5pyuvjBnJmu6aMeWYhE2YWa8oCEZ/KXRJabpNMn
9fIadey
fZK7n5pCYfLNTe8SFrQAURqKys9lf+9tiddchvzP++tYuOOA0y8oTctczKCjiYSGJW7JAUz459+0JWC3BzufmkxI8bPYvLNhZzWpknQ0SQJlR+t4FB5BYeOHPVPEc6Xhy
7vLQs9vRu1OLOs2ocpekMuzcU+nQIpvRUxdw2YTZTLiuFxd01RfDJLvjyza0WI8t24NHjnK4xiKOtPy79eUn+SE6M8hOT6Vr28Z1Xu6a8leSUsnOA9z2VBFfbdvHfcPO4obzOwUdqUEJLduDZUc5HKkky711h8orqizb74q4+hI+2bJNMe+wXlZ6KllpKWSlp5KZnkpWegrZlcvTU8hK+255Vnqqvy7Fv14qmZXn/dvJzkj9dl1Wesq3181ITcEs0jeXRi/aKX+15y5JqUOLRrw8dgA/eX4Rv3p9Kau37eOXf38maQ10bvjKsj1Y5pVipLI9WFmYVZTtwbIKDpVHLtvwAo9l2WaFFGnT7PRjyvaYko1Qttlh1w8t2+z0VNJTrdZlG69U7pK0Gmem8diNhfz2reU88ela1m3fz8Mjz6NJVnrQ0Sg/WuGXaUXEsv22aKso22+vG6FsvQKviHnZZqf7pZkWoWzTj91TPa5QMyKXtbfuu/PJXLb1TeUuSS01xfjVP3SnS25j7n1jKVc8MpsnbupDXstGx4w7cvTYvdjQYgwt4fCyPXbd8WV7MKSUY1W23+2NHl+2zbLTv1seoWyzvz3scHzZZofepso24ancpUH4Ub+OdGrViLHPLGDo/35Cs+z0mJRtaoodU5KVZesdc02psmxDDxeEl212xrGlrLKVk6FylwZj4GmteW3cQCbOXM1R5459YSwtctmGvzAWW
eMduGeQxf4p/KXRqULrmN+f1V5wYdQ6TOabdDRCQJqdxFRJKQyl1EJAmp3EVEkpDKXUQkCancRUSSkMpdRCQJqdxFRJJQYFP+mlkpsD6QO69Za+CboENUQ/lqJ97zQfxnVL7aqU2+Ts65Gr+kILByj2dmVhTNfMlBUb7aifd8EP8Zla926iOfDsuIiCQhlbuISBJSuUc2KegANVC+2on3fBD/GZWvduo8n465i4gkIe25i4gkoQZb7maWZ2YfmtlyM1tmZv8UYcxgM9ttZp/7p3vrOeM6M/vCv++iCOvNzB4ys2IzW2Jmveox2xkh2+VzM9tjZv8cNqbet5+ZTTazbWa2NGRZSzN718y+8v9tUcV1
LHfGVmN9VTtt+b2Q
8XvNzJpXcd1qnwt1nPE+M9sU8jheWsV1h5jZSv/5eE895nshJNs6M/u8iuvW6TasqlMCe/455xrkCWgH9PLPNwFWAd3DxgwG3gww4zqgdTXrLwX+BhhwPjAvoJypwNd4778NdPsBFwK9gKUhy/4buMc/fw/wQITrtQTW+P+28M+3qIdslwBp/vkHImWL5rlQxxnvA+6O4jmwGugMZACLw3+e6ipf2Po/APcGsQ2r6pSgnn8Nds/dObfFObfQP78XWA60DzbVCRsOTHWeuUBzM2sXQI7vA6udc4F/KM059zGwI2zxcOAp
xTwIgIV/074F3n3A7n3E7gXWBIXWdzzr3jnCv3L84FOsTyPk9UFdsvGn2BYufcGudcGTANb7vHVHX5zPty2auB52N9v9GoplMCef412HIPZWb5wHnAvAir+5vZYjP7m5mdVa/BwAHvmNkCMxsdYX17YGPI5RKC+QV1LVX/QAW5/Sq1dc5tAe8HEGgTYUw8bMtReH+JRVLTc6Gu3eUfOppcxWGFeNh+FwBbnXNfVbG+3rZhWKcE8vxr8OVuZo2BV4B/ds7tCVu9EO9Qw7nAw8Dr9RxvoHOuFzAUGGdmF4attwjXqde3P5lZBjAMeCnC6qC334kIdFua2S+AcuDZKobU9FyoS48AXYCewBa8Qx/hAn8uAiOpfq+9XrZhDZ1S5dUiLKvV9mvQ5W5m6XgPwrPOuVfD1zvn9jjn9vnn3wLSzax1feVzzm32/90GvIb3p2+oEiAv5HIHYHP9pPvWUGChc25r+Iqgt1+IrZWHq/x/t0UYE9i29F88+wfgOucfgA0XxXOhzjjntjrnjjrnKoDHqrjvQJ+LZpYGXA68UNWY+tiGVXRKIM+/Blvu/vG5J4Dlzrk/VjHmFH8cZtYXb3ttr6d8OWbWpPI83gtvS8OGTQdu9N81cz6wu/LPv3pU5d5SkNsvzHSg8t0HNwFvRBgzA7jEzFr4hx0u8ZfVKTMbAvwMGOacO1DFmGieC3WZMfR1nMuquO/5QFczK/D/mrsWb7vXlx8AK5xzJZFW1sc2rKZTgnn+1dUrx/F+Agbh/dmzBPjcP10KjAHG+GPuApbhvfI/FxhQj/k6+/e72M/wC395aD4DxuO9S+ELoLCet2EjvLJuFrIs0O2H94tmC3AEb2/oVqAV8D7wlf9vS39sIfB4yHVHAcX+6ZZ6ylaMd6y18jk40R97KvBWdc+Fetx+T/vPryV4RdUuPKN/+VK8d4isrquMkfL5y6dUPu9Cxt
NqymUwJ5/ukTqiIiSajBHpYREUlmKncRkSSkchcRSUIqdxGRJKRyFxFJQir3JGVmzsyeDrmcZmalZvZmDdfrWdWsf/76QjN7qJbZcs1snpktMrMLanN
u3lV84SGJrPzDLN7D1/FsBrzOwCf7a+z80su7b3W02ewWY24ETX1UGO+8wYoV+pAAAFi0lEQVTs7pO87jHPg9rclgQjLegAUmf2Az3MLNs5dxD4IbApiuv1xHv/7VvhK8wszTlXBNR2utTv433gJOppTc0s1Tl3tKZxYfnOA9Kdcz3925gIPOicezLK+zS8L7SpiDanbzCwD5h9Iuv87Vt+3DWCUeXzQBJEXX0YQqdgT3gF8lvgSv/yVLxPQr7pX84BJuN9snAR3sx1GcAGoBTvAxjX4E33Ogl4B3iOkGl8gcbAk3z3AZcr8KZ+nYL36b8vgJ+G5eoZdh/ZeJ9y/cK/zgNh/4f78SZfGhR2O73xPpAyB/g9/hSwlfnwJmcqBnb793MH3myCa/E+Gg7w
7/fwnwH/6yfLzZ/Cb426UT3qcF5+DNlfMS0Ngfuw74D3/5F0A3
pf4/0i/Ry4ICTzcev8bfVH4EO8OVuOe1z866b6/8/KvHdU8bj/AlgJvIf3gZ+7/eVdgLeBBcAnQDd/+RRgor9sFd40CFU9DyYDM/Gmo/1JyPPo
5jsRS4Jujnvk7+cyHoADrV0QPrFeM5wMtAlv9DOpjvivm3wPX++eb+D3YOcDPw55Dbuc8vhGz/cuhtPAD8KWRsC7zSfTdkWfMI2b69D7xPEW4AcvH+kvwAGOGvc8DVVfz/lgAX+eePK/fw8/7lKXz3y+4SvF9ahnd48k28ucLzgQrgfH9ca+BjIMe
DP8+cLxyv3H/vk78T9tSDXzn4ev8zO9CaTW8LiMBn7pL8/E++ukIOy2e+P9kmkENMX75VZZ7u8DXf3z/YAPQu7
X8bdMX71GdWFc+D2f59t8b7ZHI63i/0x0LGNYv0/9ap/k86LJPEnHNL/KlHR3L8n9eXAMNCjqNmAR2ruKnpzju0E+4HeHOIVN7fTjNbA3Q2s4fx9ujeqSFmH2Cmc64UwMyexSvZ14GjeJMwHcPMmuH90vjIX/Q03gRmJ+IS/7TIv9wYr9w2AOudNz8+eF+C0h2Y5U+Tk4G3F1+pcnKoBXgTV52Ml9x3h5yqelwuAc4xsyv95c38vGtDbucC4DXnz1FjZtP9fxsDA4CX/P8DeCVd6UXnHXr6yn/8ulWR86/OucPAYTPbBrTF+2XyoJk9gPeL9JMT/+9LXVC5J7/pwIN4e7GtQpYbcIVzbmXoYDPrF+E29ldx20bYtKR+wZ+L9+UD4/C+PGFUNfkiTXVa6ZCLfJz9uPs9CQb8zjn36DELvV+G+8PGveucG1nF7Rz2/z3Kyf88hd9fpMfF8P5KqGkyqUjbJQXY5fzXHqK4TlXb9nDI+aN43yC1ysx6482h8jsze8c5d38NGaUe6N0yyW8ycL9z7ouw5TOAH4fM2niev3wv3leEReMdvMnB8G+jhT+lb4pz7hXgV3hfiVadecBFZtbazFLx/sr4qLorOOd2AbvNbJC/6Loo84aaAYzy92oxs/ZmFulLFOYCA83sNH9cIzM7vY
m4b1rR9q3pcZgBj/SllMbPT/dkNQ30MXGZm2f4MiP8I3tTLwFozu8q
vm/gCtdZWYpZtYFb4KtlVHkxL+tU4EDzrln8HYi6u17fKV6Kvck55wrcc79b4RVv8Y7Z
Efxvh
3lHwLdK98+WMPN/wZoYWZLzWwx8D28b4+Zad6XFE8Bfl5Dvi3+mA/xXpRb6JyLNCVquFuA8WY2B4h0yKhazrnKF4jnmNkXeK9NHFdm/uGim4HnzWwJXtlXddii0l/wSvbzCG/1rG4dVP24PA58CSz0lz9K2F8KzvuKtxfwXl95Be9F0krXAbf6j9Myjv0KvJV4v1D/hjez4iGifx6cDXzmP96/wHtOSBzQrJAiDZiZTcE7Vv5y0FkktrTnLiKShLTnLiKShLTnLiKShFTuIiJJSOUuIpKEVO4iIklI5S4ikoRU7iIiSej/AA9NjrE2HzfSAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [1, 2, 3, 4, 5, 10, 20]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, param, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different tree depths')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Maximum bins"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we will perform our evaluation on the impact of setting the number of bins for the decision tree. As with the tree depth, a larger number of bins should allow the model to become more complex and might help performance with larger feature dimensions. After a certain point, it is unlikely that it will help any more and might, in fact, hinder performance on the test set due to over-fitting:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[1.2692079792473667, 0.8059355903824542, 0.7446332199349833, 0.5969914946964172, 0.6252775817287765, 0.6252775817287765, 0.6252775817287765]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHyFJREFUeJzt3Xt8VPWd
HXJ5kk5IICJqASENAgxQuoqWvrjdqtgq1i1aps+9Pd2tLuT9uuv1qLu7+qxd31svbX1q2Xsq61+utqrdUWXZVa62W3amuQi6CCgBcCClFABcI1n/3jnCGTYSYzJDNMzsn7+XjkwcyZ75z5nBzyzjffc873mLsjIiLxUlbqAkREpPAU7iIiMaRwFxGJIYW7iEgMKdxFRGJI4S4iEkMKdxGRGMoZ7mZ2p5mtNbNFWV6famYLzWy+mbWY2QmFL1NERPaE5bqIycxOAjYCd7v74RlerwM2ubub2ZHA/e4+rijViohIXhK5Grj7s2Y2qpvXN6Y8rQXyuuS1vr7eR43KuloREclg7ty577l7Q652OcM9H2b2eeA6YCjw2W7aTQemA4wcOZKWlpZCfLyISL9hZm/l064gB1Td/aFwKOYs4Npu2s1y92Z3b25oyPmLR0REeqigZ8u4+7PAwWZWX8j1iojInul1uJvZIWZm4eOjgUrg/d6uV0REei7nmLuZ3QtMAurNrBW4GqgAcPfbgXOAC81sO9AOnO+aR1hEpKTyOVtmWo7XbwBuKFhFIiLSa7pCVUQkhhTuIiIxFLlwX/LuR9w0ZwnrNm0rdSkiIn1W5ML9jfc28pOnlrHmwy2lLkVEpM+KXLhXVwbHgDdv21niSkRE+q7IhXttZTkAm7ftKHElIiJ9V+TCvSbsuW/aqp67iEg2EQz3oOfevl09dxGRbKIX7lVBuKvnLiKSXeTCvXbXAVX13EVEsolcuFdXJA+oqucuIpJN5MK9rMyorihXuIuIdCNy4Q5QW1XOpq0alhERySaS4V5dWU67eu4iIllFMtxrKxNs0gFVEZGsIhnuNZUacxcR6U5Ewz2hcBcR6UZEw10HVEVEuhPJcK+tUs9dRKQ7kQz36spyXaEqItKNnOFuZnea2VozW5Tl9S+a2cLw6zkzm1D4Mruq1QFVEZFu5dNzvwuY3M3
wAnu/uRwLXArALU1a3kAdWODi/2R4mIRFLOcHf3Z4F13bz+nLuvD5++ADQWqLasOqf9Ve9dRCSTQo+5Xww8VuB17qamSrfaExHpTqJQKzKzTxGE+wndtJkOTAcYOXJkjz+r6632qnq8HhGRuCpIz93MjgTuAKa6+/vZ2rn7LHdvdvfmhoaGHn9eclhGN+wQEcms1+FuZiOBB4H/5e5Le19Sbsn7qOpWeyIimeUcljGze4FJQL2ZtQJXAxUA7n47cBWwH3CrmQHscPfmYhUMwZS/oJ67iEg2OcPd3afleP0rwFcKVlEeqit0qz0Rke5E8grVZM9dZ8uIiGQWyXBPjrlvUriLiGQU0XAPe+6aGVJEJKNIhnt1hYZlRES6E8lwLyuz8G5M6rmLiGQSyXCH8IYd6rmLiGQU4XBP0K5wFxHJKMLhrlvtiYhkE+lw1wFVEZHMIhvuwX1U1XMXEckksuGunruISHYRDvcEm9RzFxHJKMLhXq6zZUREsohsuNdWJTTlr4hIFpEN9+qKctq376Sjw0tdiohInxPZcE9O+9u+Xb13EZF0kQ33zml/dVBVRCRdhMM9Oe2veu4iIukiHO7JW+0p3EVE0kU23DtvtadhGRGRdDnD3czuNLO1ZrYoy+vjzOx5M9tqZpcXvsTMksMymvZXRGR3+fTc7wImd/P6OuCbwE2FKChfyWGZdvXcRUR2kzPc3f1ZggDP9vpad38R2F7IwnKpTZ4towOqIiK7ieyYe3WlxtxFRLLZq+FuZtPNrMXMWtra2nq1rs4Dquq5i4ik26vh7u6z3L3Z3ZsbGhp6ta4BiXLMdEBVRCSTyA7LlJUZ1RXlbNat9kREdpPI1cDM7gUmAfVm1gpcDVQAuPvtZrY/0ALsA3SY2d8B4939w6JVHaqpTLBZc8uIiOwmZ7i7+7Qcr78LNBasoj1QW6Weu4hIJpEdloFg2l+NuYuI7C7S4V5bldDdmEREMoh0uNdUlmvKXxGRDCIf7pryV0Rkd5EO99rKBJu3q+cuIpIu0uFeU6Weu4hIJtEO98qExtxFRDKIeLiXs2V7Bzs7vNSliIj0KZEO9+S0v+26SlVEpItIh/uuaX91laqISBeRDndN+ysiklmkwz15qz0dVBUR6Sri4a6eu4hIJhEP96DnrnAXEekq0uG+a8xdB1RFRLqIdLjXVCTH3NVzFxFJFe1wD3vu7TqgKiLSRaTDvbZSPXcRkUwiHe4DKsow05i7iEi6SIe7mVFTUa6zZURE0kQ63AFqqhIalhERSZMz3M3sTjNba2aLsrxuZnazmS0zs4VmdnThy8yuprKczTqgKiLSRT4997uAyd28PgVoCr+mA7f1vqz81VQmNCwjIpImZ7i7+7PAum6aTAXu9sALwCAzO6BQBeZSq567iMhuCjHmPhxYmfK8NVy2GzO
mYtZtbS1tZWgI8Opv3dpFvtiYh0UYhwtwzLMt4ayd1nuXuzuzc3NDQU4KODc93bNSwjItJFIcK9FRiR8rwRWF2A9ealpqpcU/6KiKQpRLjPBi4Mz5o5DvjA3d8pwHrzEpwto567iEiqRK4GZnYvMAmoN7NW4GqgAsDdbwceBU4HlgGbgb8pVrGZ1FYmdEBVRCRNznB392k5XnfgkoJVtIdqKhNs2d7Bzg6nvCzT8L+ISP8T/StUd92NSb13EZGk6If7rml/Ne4uIpIU+XDXtL8iIruLfLhXh8MymzTtr4jILpEP92TPvX27eu4iIkmRD/fkmLt67iIinaIf7rvOllHPXUQkKfLhnhyWUbiLiHSKfLjrPHcRkd3FINzDUyE17a+IyC6RD/cBFWWYQbt67iIiu0Q+3M2M2krdJFtEJFXkwx2CC5k05i4i0ikW4V6rOd1FRLqIRbjXVCZ0QFVEJEVMwl3DMiIiqeIR7lUJDcuIiKSIRbjXqucuItJFLMK9urJcY+4iIiliEe61lQlN+SsikiKvcDezyWa2xMyWmdmMDK8fZGZPmtlCM3vazBoLX2p2NVXlmvJXRCRFznA3s3LgFmAKMB6YZmbj05rdBNzt7kcCM4HrCl1od2oqEmzd0cHODt+bHysi0mfl03M/Fljm7ivcfRtwHzA1rc144Mnw8VMZXi+q2irNDCkikiqfcB8OrEx53houS7UAOCd8/HlgoJnt1/vy8lOjOd1FRLrIJ9wtw7L08Y/LgZPNbB5wMrAK2K0
WbTzazFzFra2tr2uNhsanSTbBGRLvIJ91ZgRMrzRmB1agN3X+3uZ7v7UcA/hMs+SF+Ru89y92Z3b25oaOhF2V3pVnsiIl3lE+4vAk1mNtrMKoELgNmpDcys3syS67oSuLOwZXavtkrDMiIiqXKGu7vvAC4F5gCvAve7+2Izm2lmZ4bNJgFLzGwpMAz4pyLVm1F1clhGB1RFRABI5NPI3R8FHk1bdlXK4weABwpbWv6SN8luV89dRASIyRWqOqAqItJVrMJdY+4iIoFYhLsOqIqIdBWLcK9KlFFmukJVRCQpFuFuZ
VnohIiliEOwTj7u3b1XMXEYEYhXttlXruIiJJsQn36grdak9EJCk24V5bVa6zZUREQrEJ95rKBJsU7iIiQKzCvZzNukJVRASIVbgnNCwjIhKKTbgHY+7quYuIQIzCvbqyXGPuIiKh2IR7bWWCbTs62LGzo9SliIiUXGzCfdfMkNvVexcRiVG4hzND6ipVEZH4hHttlW61JyKSFJtwr9Gt9kREdolRuOtWeyIiSbELd13IJCKSZ7ib2WQzW2Jmy8xsRobXR5rZU2Y2z8wWmtnphS+1e7rVnohIp5zhbmblwC3AFGA8MM3Mxqc1+7/A/e5+FHABcGuhC81lcE0lAH9+4/29/dEiIn1OPj33Y4Fl7r7C3bcB9wFT09o4sE/4eF9gdeFKzE/DwCou/MRB/Pz5t3hu+Xt7++NFRPqUfMJ9OLAy5XlruCzVNcCXzKwVeBT4RqYVmdl0M2sxs5a2trYelNu9K6d8jDH1tXznVwv5cMv2gq9fRCQq8gl3y7DM055PA+5y90bgdOAeM9tt3e4+y92b3b25oaFhz6vNobqynB+cN4F3P9zCzIdfKfj6RUSiIp9wbwVGpDxvZPdhl4uB+wHc/XlgAFBfiAL31FEjB3PJpIN5YG4rcxa/W4oSRERKLp9wfxFoMrPRZlZJcMB0dlqbt4FPA5jZxwjCvfDjLnm69JQmDh++D3
4Mu8t3FrqcoQESmZnOHu7juAS4E5wKsEZ8UsNrOZZnZm2OzbwFfNbAFwL/DX7p4+dLPXVCbK+OF5E/lo6w5m/PplSliKiEhJJPJp5O6PEhwoTV12VcrjV4DjC1ta7zQNG8gVpx3KP/7nq/xqbivnNY/I/SYRkZiIzRWqmXz5+NEcN2YIMx9+hZXrNpe6HBGRvSbW4V5WZtz0hQkAfPtXC+jo0PCMiPQPsQ53gMbBNVx9xnj+/MY67vzjG6UuR0Rkr4h9uAOce0wjnxk/jBvnLGHpmo9KXY6ISNH1i3A3M647+wgGViW47Jfz2bZD91kVkXjrF+EOUF9XxXVnH8Hi1R9y85Ovl7ocEZGi6jfhDnDqYfvzhWMaufXpZbz09vpSlyMiUjT9KtwBrjpjPAfsW83/+eV8Nut+qyISU/0u3AcOqOAH503grXWbue7R10pdjohIUfS7cAc4bsx+XHz8aO554S2eWVqyKXBERIqmX4Y7wOWnHUrT0Dqu/PVCtmzXrflEJF76
gPqCjn2rMOZ/UHW5j17IpSlyMiUlD9NtwhGJ6Zcvj+3Pb0ct79YEupyxERKZh+He4Af3/6x9jZ4dz4uA6uikh89PtwHzGkhotPHM2D81Yxf+WGUpcjIlIQ/T7cAS751CHU11Ux8+HFurGHiMSCwh2oq0pwxWmH8tLbG5i9IP32sCIi0aNwD517TCOHHbgP1z/2Gu3bdGqkiESbwj1UVmZcfcZhvKNTI0UkBhTuKY4dPYTPHnEAtz+znHc+aC91OSIiPZZXuJvZZDNbYmbLzGxGhtd/aGbzw6+lZhbZ005mTBnHTndufHxJqUsREemxnOFuZuXALcAUYDwwzczGp7Zx98vcfaK7TwT+FXiwGMXuDSOG1PDVE0fz0LxVmhZYRCIrn577scAyd1/h7tuA+4Cp3bSfBtxbiOJK5W8nHULDwCpmPvyKTo0UkUjKJ9yHAytTnreGy3ZjZgcBo4E/9L600kmeGjl/5QZ+O1+nRopI9OQT7pZhW
u7AXAA+6e8VxCM5tuZi1m1tLW1ren2j3n6EaOGL4v1z/2mm7qISKRk0+4twIjUp43Atm6sxfQzZCMu89y92Z3b25oaMi/yhIoKzOuOmM87364hZ8+o1MjRSRa8gn3F4EmMxttZpUEAT47vZGZHQoMBp4vbIml8/FRQ/jskQfw02eXs3qDTo0UkejIGe7uvgO4FJgDvArc7+6LzWymmZ2Z0nQacJ/H7AjklVPG0eFwg2aNFJEISeTTyN0fBR5NW3ZV2vNrCldW39E4uIbpJ47hJ08t48JPjOKYgwaXuiQRkZx0hWoe/nbSwQwdWMXMR16hoyNWf5iISEwp3PNQW5XgisnjWLByA79dsKrU5YiI5KRwz9PZRw3nyMZ9ueGxJTo1UkT6PIV7nsrKjKs+F5waebtOjRSRPk7hvgeaRw3hjAkH8tNnlrNKp0aKSB+mcN9DM6aMA+CGx3RqpIj0XQr3PTR8UDVfO2kMsxesZu5b60pdjohIRgr3HvjayQczbJ9g1kidGikifZHCvQdqqxJ8d/I4FrR+wEPzdGqkiPQ9CvceOmvicCaMGMSNc15j01adGikifYvCvYeSp0au+XArtz+zvNTliIh0oXDvhWMOGsyZEw5k1rMraF2/udTliIjsonDvpRlTxmEG1+vUSBHpQxTuvXTgoGqmn3Qwjyx8h5Y3dWqkiPQNCvcC+PrJY9h/nwF8X6dGikgfkdd87tK9msoE351yKJf9cgEPzlvFucc0lrqkgtrZ4aze0M62nR2lLkUkFgZVV7BfXVVRP0PhXiBTJwzn58+9xY2Pv8aUw/entip639qdHc7b6zazdM1HvL7mI5au2cjSNR+x4r1NbNuhYBcplK+ffPCuqUyKJXoJ1Eclb6h99q3PcdvTy7n8tENLXVJWOzuclckQXxsE+NI1G1netrFLiA8fVM3YYXWcPLaBgxvqqKrQKJ5IIRwytK7on6FwL6CjRw7mrIkHMuu/VnD+x0cwYkhNSevp6HBWrt+8qwf+ekqIb00L8aZhdZzYVM8hQ+sYO2wghwytoy6Cf32ISEA/vQV2xeRxPL74Xa5
DVu+auj98pndnQ4revbgx742o94PQzz5W0b2bK9M8QP3HcATcMG8smD92PssIE0DaujadhAhbhIDOX1U21mk4EfA+XAHe5+fYY25wHXAA4scPe/KmCdkXHgoGq+fvLB/Oj3r3PRJ9Zx7OghBVt3R4ezakP7rmGU18NhlWVrN9K+feeudgfsO4BDhtZx3JiDGBsGeNPQOgYOqChYLSLSt5l796fumVk5sBT4DNAKvAhMc/dXUto0AfcDp7j7ejMb6u5ru1tvc3Ozt7S09Lb+Pql9205O+cHT7FdXyexLTqCszPbo/ckQf31t50HNZWGIb97WGeL77zMg6H0PHdgZ4sPq2EchLhJbZjbX3Ztztcun534ssMzdV4Qrvg+YCryS0uarwC3uvh4gV7DHXXVlOTOmjONb983ngZdaOa95RMZ27mGIJ8fE13b2xlNDfOjAKsYOG8j5Hx/B2GFBkB8ydCD7VivERSSzfMJ9OLAy5Xkr8BdpbcYCmNkfCYZurnH3xwtSYUSdOeFA7nruTf5lzhKmHL4/H27Zseug5utrNrJ07UaWrfmITSkh3jCwirHD6jivuTPEm4YOZN8ahbiI7Jl8wj3TmEL6WE4CaAImAY3Af5nZ4e6+ocuKzKYD0wFGjhy5x8VGiVkwa+Tnb32Oo699gu07O79l9XVBiH+heUSXYZVBNZUlrFhE4iSfcG8FUscVGoHVGdq84O7bgTfMbAlB2L+Y2sjdZwGzIBhz72nRUXHUyMF873Pjeev9TTQNG8jY8DTDwbUKcREprnzC/UWgycxGA6uAC4D0M2F+A0wD7jKzeoJhmhWFLDSqLj5hdKlLEJF+KOclh+6+A7gUmAO8Ctzv7ovNbKaZnRk2mwO8b2avAE8B33H394tVtIiIdC/nqZDFEudTIUVEiiXfUyE1WYiISAwp3EVEYkjhLiISQwp3EZEYUriLiMSQwl1EJIZKdiqkmbUBb+VoVg+8txfK6Wu03f1Pf912bfeeO8jdG3I1Klm458PMWvI5nzNutN39T3/ddm138WhYRkQkhhTuIiIx1NfDfVapCygRbXf/01+3XdtdJH16zF1ERHqm
fcRUSkB/psuJvZZDNbYmbLzGxGqespFjMbYWZPmdmrZ
YzL4VLh9iZk+Y2evhv4NLXWsxmFm5mc0zs0fC56PN7E/hdv/SzGJ3ZxMzG2RmD5jZa+F+/0R/2N9mdln4f3yRmd1rZgPiuL/N7E4zW2tmi1KWZdy/Frg5zLmFZnZ0oerok+FuZuXALcAUYDwwzczGl7aqotkBfNvdPwYcB1wSbusM4El3bwKeDJ/H0bcI7hOQdAPww3C71wMXl6Sq4vox8Li7jwMmEGx
Pe3mQ0Hvgk0u/vhBPdavoB47u+7gMlpy7Lt3ykEd61rIrgF6W2FKqJPhjtwLLDM3Ve4+zbgPmBqiWsqCnd/x91fCh9/RPCDPpxge38eNvs5cFZpKiweM2sEPgvcET434BTggbBJ7L
zPYBTgL+HcDdt4X3Go79/ia481u1mSWAGuAdYri/3f1ZYF3a4mz7dypwtwdeAAaZ2QGFqKOvhvtwYGXK89ZwWayZ2SjgKOBPwDB3fweCXwDA0NJVVjQ/Aq4AOsLn+wEbwrt/QTz3+xigDfhZOBx1h5nVEvP97e6rgJuAtwlC/QNgLvHf30nZ9m/Rsq6vhrtlWB
03rMrA74NfB37v5hqespNjP7HLDW3eemLs7QNG77PQEcDdzm7kcBm4jZEEwm4RjzVGA0cCBQSzAkkS5u+zuXov2f76vh3gqMSHneCKwuUS1FZ2YVBMH+C3d/MFy8JvnnWfjv2lLVVyTHA2ea2ZsEw26nEPTkB4V/tkM893sr0OrufwqfP0AQ9nHf338JvOHube6+HXgQ+CTx399J2fZv0bKur4b7i0BTeCS9kuDAy+wS11QU4TjzvwOvuvv/S3lpNnBR+Pgi4Ld7u7Zicvcr3b3R3UcR7N8/uPsXCW6wfm7YLI7
S6w0swODRd9GniFmO9vguGY48ysJvw/n9zuWO/vFNn272zgwvCsmeOAD5LDN73m7n3yCzgdWAosB/6h1PUUcTtPIPgzbCEwP/w6nWD8+Ung9fDfIaWutYjfg0nAI+HjMcCfgWXAr4CqUtdXhO2dCLSE+/w3wOD+sL+B7wOvAYuAe4CqOO5v4F6C4w
CXrmF2fbvwTDMreEOfcywdlEBalDV6iKiMRQXx2WERGRXlC4i4jEkMJdRCSGFO4iIjGkcBcRiSGFe0SZmZvZPSnPE2bWlpxdsZv3TTSz07t5vdnMbu5lbQ3hTH/zzOzE3qwrXN+o5Ax7qfWZWZWZ/d7M5pvZ+WZ2Yjjr4Hwzq+7t53ZTzyQz+2Sx1p/lM+8o1eR54fZm/H9lZo+a2aC9XZPklsjdRPqoTcDhZlbt7u3AZ4BVebxvItAMPJr+gpkl3L2F4Bzs3vg08Jq7X5SzZednl7v7zlzt0uo7Cqhw94nhOm4HbnL3n+X5mUZww5qOnI27mgRsBJ7bw/f1mLt/ZW991p5w96wdBSmxUp/wr68eXyixEfhn4Nzw+d3Ad+m8GKgWuJPgat95BPN6VBJcKdhGcLHU+cA1BLf8+h3wH3S9oKgO+BnBxRULgXMIpmq9i+BClJeBy9Lqmpj2GdXAtLDtIuCGtG2YSTBR2glp6zkGWAA8D/wLsChcPgl4hGDipWUEE1DNB75GMBPfGwTTOAB8J9z+hcD3w2WjCGbevDX8vhwEnBp+zksEF9LUhW3fJLjw5qWw/nHh+98l+EU6Hzgxre5rCGb9+134
OBG8P3P07wywjgqrC2ReH33wg6Wy8Ck8I21wH/FD5+mvACl/D7dgPBxFu/J5hF9WlgBXBm2OavgZ+k1PVIynpzvj9tmyYBzwIPEVxVejtQlvI9qk/5vv4bsDjc/uqwzTfD9y0E7iv1z05/+Sp5Afrq4Y4LfkCPJJibZEAYNJPoDOZ/Br4UPh5EcLVvbYYf+mvCH/LkD2LqOm4AfpTSdjBB6D6RsmxQhtp2fQbBJFFvAw1heP0BOCt8zYHzsmzfQuDk8PFu4Z7+OHx+F52/7E6lMzTLwnA7KQyhDuC4sF19GFy14fPvAleFj98EvhE+/t/AHSnfs8uz1H0N8N9ABcFc7ZuBKeFrD6Vs+5CU99wDnBE+PiwMyc8Q/PKpDJc/TWe4e9o6f5fyefPT90H4PDXcc74
ZsmAVsIriYtB55I+T6/SWe47wAmhsvvp/P/32rCK0/J8P9FX8X50ph7hLn7QoIfqmnsPsxyKjDDzOYTBMMAYGSWVc32YGgn3V8SXBqd/Lz1BL27MWb2r2Y2Gcg1g+XHgac9mDBqB/ALgpAF2EkwYVoXZrYvQQg8Ey66J71NHk4Nv+YR9LzHEdwQAeAtD+bOhuAGKeOBP4bfq4sIevNJyYnc5hJ8
PxmAeTY71MEIaPh8tfTlnHp8LjEi8TTJp2GIC7LybY3oeBL3twP4N029LW+UzK5+VTY0/e/2cP7q+wk+Dy+hMytHnD3eeHj1O/XwuBX5jZlwh+AcheoDH36JtNME/2JIL5K5IMOMfdl6Q2NrO/yLCOTVnWbaRNP+ru681sAnAacAlwHvDlburLNKVp0hbPPM6+2+f2gAHXuftPuywM5szflNbuCXeflmU9W8N/d5L/z8tWAHfvMLPtHnZZCf5iSJjZAIJhoWZ3X2lm1xD88k06AtgADMuy/vR1pn5essYddD1hYsAevj9d+v7ItH+2pjzeSTAkB8ENWU4CzgS+Z2aHeecc7lIk6rlH353ATHd/OW35HOAb4UFDzOyocPlHwMA81/074NLkEzM
Gb1BOOtvwa+RzBdbXf+BJxsZvXh7ROnAc909wYP7kz0gZkle4dfzLPeVHOAL4fz5GNmw80s0w0wXgCON7NDwnY1ZjY2x7r35HuYSTJo3wvrS86KiJmdTfBL+iTg5l6cifImMNHMysxsBMG4em8cG87SWkZwrOa/83lT2H6Euz9FcGOWQQTHcqTIFO4R5+6t7v7jDC9dSzCOujA8jfDacPlTwPjk6YM5Vv+PwGALbmi8APgUwV1ing6HMO4CrsxR3zthm6cIDpC+5O75TOv6N8AtZvY8kGnIqFvunjxA/Hw49PEAGQLZ3dsIxqfvNbOFBGE/LsfqHwY+H34P9/hUz/CX178RDIP8huAgKuEvzuuBi919KfATgvut9sQfCQ4uv0zwl91LPVxP0vNhbYvC9T6U5/vKgf8f7oN5BPdL3dDLWiQPmhVSRCSG1HMXEYkhhbuISAwp3EVEYkjhLiISQwp3EZEYUriLiMSQwl1EJIYU7iIiMfQ/fdB
6ySS4AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, 5, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient BOOSTED TREE"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"data_gbt = data_rec.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"(trainingData, testData) = data_gbt.randomSplit([0.7, 0.3])"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gradient BOOSTED predictions: [(40.0, 18.20990171759985), (2.0, 18.223666887903477), (8.0, 127.79752806968237), (106.0, 120.2269624548493), (37.0, 133.7865565239979)]\n"
]
}
],
"source": [
"model = GradientBoostedTrees.trainRegressor(trainingData,\n",
" categoricalFeaturesInfo={}, numIterations=3)\n",
"preds = model.predict(testData.map(lambda p: p.features))\n",
"actual = testData.map(lambda p: p.label)\n",
"true_vs_predicted_GBT = actual.zip(preds)\n",
"print (\"Gradient BOOSTED predictions: \" + str(true_vs_predicted_GBT.take(5)))\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5240\n",
"log - Mean Squared E
or: 14147.5213\n",
"log - Mean Absolue E
or: 82.3479\n",
"Root Mean Squared Log E
or: 0.7971\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_GBT.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(trainingData,categoricalFeaturesInfo, loss, numIterations, maxDepth, maxBins):\n",
"\n",
" model = GradientBoostedTrees.trainRegressor(trainingData,categoricalFeaturesInfo, loss,numIterations,maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(testData.map(lambda p: p.features))\n",
"\n",
" actual = testData.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient boost tree Iteration"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.8199092268881405, 0.8191629830985493, 0.8176741701394266, 0.8147160855323975, 0.8089602867888349, 0.7979663301656902, 0.7864647907071756]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEOCAYAAACJlmBtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VPW9
HXJzuQEJYk7JsQkE1B41o3FBC4tdrWKtS1VWmt2rp0sfdn77Xetvd6vdWqVStai3pVpNZa7FXBBVwqVYJsAQTCooQ17ASEkOTz+2NOYIhZBrKcTHg/H4955MyZ7/mez0xO5p3vOWfOmLsjIiKSEHYBIiLSPCgQREQEUCCIiEhAgSAiIoACQUREAgoEEREBFAgiIhJQIIiICKBAEBGRgAJBREQASAq7gCORlZXlvXv3DrsMEZG4Mnfu3C3unl1Xu7gKhN69e5Ofnx92GSIiccXMPoulnXYZiYgIoEAQEZGAAkFERIAYA8HMxpjZMjMrNLM7q3m8p5nNNLN5Z
QzMYF80eZ2VwzWxT8PD9qmZOD+YVm9pCZWcM9LREROVJ1BoKZJQKPAGOBQcAEMxtUpdldwFR3Hw6MBx4N5m8BLnL3ocA1wLNRyzwGTARyg9uYejwPERGpp1hGCKcChe6+yt1LgSnAxVXaONA2mM4E1gO4+zx3Xx/MXwykmVmqmXUB2
7bI98ZdszwCX1fC4iIlIPsZx22g1YG3W/CDitSpu7gRlmdgvQBhhZTT/fBOa5+34z6xb0E91nt1iLPlJzP9vGnv3ldGiTQsf0FDq0SSE1KbGxViciEpdiCYTq9u1X/SLmCcBkd/+tmZ0BPGtmQ9y9AsDMBgP3AqOPoE+CZScS2bVEz549Yyj3yx58u5D3lhcfNi89NYkObSLh0LFNCu2DnwfnpafQoU3qwXmtUxLRYQ4RacliCYQioEfU/e4Eu4SiXEdwDMDdZ5tZGpAFbDaz7sBfgavdfWVUn93r6JOgv0nAJIC8vLxqQ6Muv75kCJt27WPrnlK2BbetJaVs27OfrXtK2bhrH0s27GLrnlJKyyqq7SM1KSESDlWCIvrW8eDPVNq2SlKAiEhciSUQ5gC5ZtYHWEfkoPG3q7T5HLgAmGxmA4E0oNjM2gH/B/zc3f9R2djdN5jZbjM7HfgIuBp4uN7PpgY9OrSmR4fWdbZzd/aUlrOtpJSte/ZHgiMIkO1R01v3lLJ6SwnbSkrZU1pebV9JCfblUUebSJh0SE+hQ+uUw3ZhtW+dQmKCAkREwlNnILh7mZndDEwHEoGn3H2xmd0D5Lv7NOAO4Akzu43Irp9r3d2D5foBvzCzXwRdjnb3zcCNwGSgFfB6cAuVmZGemkR6ahI9O9YdIAD7DpQfGnXsCUYdJaVV5pWyZH1kBLLziwM1rBvatUo+OMLoEIxGDg+U1MNGJClJ+hiJiDQci5zkEx/y8vI83q9ldKC8gu17g8AoOXzUsa1yVBIVKNv3llJRw68oIzWJbu1b0TcnndycdHJzMsjtlE7vjm0UFiJykJnNdfe8utrF1cXtWoLkxARyMtLIyUiLqX1FhbPziwNRxz8ixz0qw2Tttr0UrNvJa4s2UJntiQlGr46tDwuJvtmRW6sUnV0lItVTIDRzCcGxiPZtUmptt+9AOSuLSyjcXMKKTcHPzbt5a+lmyoMhhhn0aB8Jin6d0umXnU5upwz65aSTnqpNQeRYp3eBFiItOZHBXTMZ3DXzsPmlZRWs2
nsJAo3FzC+yu2UFp+6Iyqrplp9OuUEYREZBdUv5x02rWuPYhEpOVQILRwKUkJ9O+UQf9OGYfNLyuvYO32L1ixaTcrNh8KixdWb+OLA4fOnMpKT43segpCom+wGyorPUWn1Yq0MAqEY1RSYgJ9strQJ6sNowcfml9R4azb8cXBgFixqYTC4hL++sk6du8vO9iuXevkYBQR2eVUGRqd26YpKETilAJBDpOQYAc/tzHi+JyD892dTbv2H9zltGJzCYWbSni9YAM79h46lTY9NSnqrKdISPTLzqB7+1Yk6HMWIs2aAkFiYmZ0zkyjc2YaZ+ce+mpWd2frntKokIjsgnp3eTEvzT10uaq05AT6ZleGREZkulM6vTq0JilRp8iKNAcKBKkXMyMrPZWs9FROP67jYY/t3HuAwuLIbqfK4xRz1mznlfmHrlKSEuy66hccxI4cq8igb3YbBYVIE1MgSKPJbJ3Myb06cHKvDofNL9lfxsrNh0KicPNuCtbv5LWCQ5+laNc6mVEDOzF2aGe+0i9LV6cVaQIKBGly6alJnNijHSf2aHfY/MrPUqzYVMKsZZt5o2Ajf55bREZqEucPzGHskM6c2z9HH64TaSS6dIU0W/vLyvmwcCtvFGxkxpKNbN97gFbJiZw3IJsxQzpz/vE5ZKQlh12mSLMX66UrFAgSF8rKK/h49TZeK9jA9MWbKN69n5TEBM7OzWLMkM6MGtRJH6ITqYECQVqsigpn7ufbeX3RRqYv3si6HV+QlGCc0bcjY4Z0ZvSgzmRnpIZdpkizoUCQY4K7s2jdTl4v2MgbBRtZvWUPZnBK7w6MHdKZMUM60yWzVdhlioRKgSDHHHdn2abdvL4oEg7LNu0GYFiPdowd0pmxQ7rE/D0XIi2JAkGOeauKSw6OHBat2wnAoC5tI+EwtDP9cjLq6EGkZVAgiERZu20v0xdv5PWCjcz9bDsA/XLSD+5WGtSlra7BJC2WAkGkBpt27YuEw6KNfLR6KxUOPTu0PhgOw3q0UzhIi6JAEInB1pL9vLlkE68XbOTDlVs4UO50yUzjwsGdGTukM3m9O5Coi/JJnFMgiByhnV8c4O2lkXB4b3kx+8sqyEpPYXQQDqcf15FkXV9J4pACQaQe9uwvY+ayzbxesJGZn25mb2k5ma2SGTWoE2OHdOasXF1fSeJHgwaCmY0BHgQSgSfd
+qPN4TeBpoF7S5091fM7OOwEvAKcBkd785aplZQBfgi2DWaHffXFsdCgQJw74D5by3vJg3Cjby5tJN7N5XRnpqEucfH1xfaUA2rVN0WTBpvmINhDq3YjNLBB4BRgFFwBwzm+buS6Ka3QVMdffHzGwQ8BrQG9gH/AIYEtyqusLd9Q4vzVpaciKjB3dm9ODOlJZV8OHKLcH1lTYxbcF60pITOK9/DmOH6vpKEt9i+bfmVKDQ3VcBmNkU4GIgOhAcaBtMZwLrAdx9D/CBmfVrsIpFQpSSlMB5A3I4b0AOv7qkgo/XbOON4LMObyzeSEpiAmdVXl9pYCfat9H1lSR+xBII3YC1UfeLgNOqtLkbmGFmtwBtgJExrv9PZlYO/AX4lVez/8rMJgITAXr27BljtyKNLykxgTP7ZnFm3yzuvmgw89ZGrq/0esFG3vl0M4kJxhnHBddXGtyJnIy0sEsWqVWdxxDM7FvAhe5+fXD/KuBUd78lqs3tQV+/NbMzgD8CQ9y9Inj8WiCvyjGEbu6+zswyiATC/7r7M7XVomMIEg/cnYJ1u3hj8QZeL9jIquLg+kq9OvDds3pz4eDO+pyDNKkGO4ZAZETQI+p+d4JdQlGuA8YAuPtsM0sDsoAaDxK7+7rg524ze57IrqlaA0EkHpgZQ7tnMrR7Jj8ePYAVm0t4fdFG
ZgHd
308YOTCHey4eQtd2uuieNC+xnFQ9B8g1sz5mlgKMB6ZVafM5cAGAmQ0E0oDimjo0syQzywqmk4GvAgVHXr5I82Zm9O+UwY9G5jLj1nP4f+MG8o/CrYy6/12e+mA15RXxc9q3tHyxnnY6DvgdkVNKn3L3X5vZPUC+u08Lzix6AkgncoD5p+4+I1h2DZEDzinADmA08BnwHpAc9PkWcLu7l9dWh3YZSUuwdttefvG3AmYtK+bE7pn85htDGdw1M+yypAXTB9NEmjF35+8LN/DLV5ewfW8p15/Vhx+NzNXnGaRRxBoI+hy+SAjMjItO7M
t5/LZXndefy9VYx+4D1mLav1s5kijUqBIBKizNbJ/Oc3TmDq984gNSmBa/80hx++MI/i3fvDLk2OQQoEkWbg1D4deO1HZ3PryFzeKNjIyPvf5cU5nxNPu3Ql/ikQRJqJ1KREbh3Zn9d+dDYDOmfws78sYvykf7KyuCTs0uQYoUAQaWb65aQz5YbTufebQ/l0427G/u59HnxrBfvLaj0JT6TeFAgizVBCgnH5KT156/ZzGTOkMw+8tZxxD77Px6u3hV2atGAKBJFmLDsjlYcmDGfyd05hf1kFlz0+m5+/vJCdew+EXZq0QAoEkThw3oAcZtx2DhPPOY6p+UVccP+7vLpgvQ46S4NSIIjEidYpSfzruIFMu/krdG2Xxi0vzOM7k+ewdtvesEuTFkKBIBJnBnfN5K8/+A
9tVBfLx6G6MfeI8n319FWXlF2KVJnFMgiMShxATju2f14c3bz+U
Tryq/9byiWP/oNFRTvDLk3imAJBJI51a9eKJ67O47ErTmLzrv1c/MgH/Mffl7Bnf1nYpUkcUiCIxDkzY+zQLrx1x7l8+7Se/PGD1Yx+4D3e+XRT2KVJnFEgiLQQbdOS+dUlQ/nLjWfQJjWR707O56bnPmHzrn1hlyZxQoEg0sKc3KsDf7/lbH48uj9vLt3EBfe/y3MffUaFvoxH6qBAEGmBUpISuPn8XKbfeg5Dumby
5awGWPz2bFpt1hlybNmAJBpAXrk9WG5284jf/51okUFpcw7qH3uX/GMvYd0HWR5MsUCCItnJlx6cndefv2c7nohK489E4h4x58n9krt4ZdmjQzCgSRY0TH9FTuv3wYz153KmUVzoQn/slP
yA7XtKwy5NmgkFgsgx5uzcbKbfeg43nteXv85bx8j73+WVeet0XSSJLRDMbIyZLTOzQjO7s5rHe5rZTDObZ2YLzWxcML9jML/EzH5fZZmTzWxR0OdDZmYN85REpC6tUhL52ZjjefWWs+jRoTW3vjifq5/6mM+36rpIx7I6A8HMEoFHgLHAIGCCmQ2q0uwuYKq7DwfGA48G8/cBvwB+XE3XjwETgdzgNuZonoCIHL2BXdrylxvP5J6LBzPv8x2M/t27/OHdlRzQdZGOSbGMEE4FCt19lbuXAlOAi6u0caBtMJ0JrAdw9z3u/gGRYDjIzLoAbd19tkfGqc8Alxz90xCRo5WYYFx9Rm/evP0czu2fzX+9/ikXPfwB89fuCLs0aWKxBEI3YG3U/aJgXrS7gSvNrAh4Dbglhj6L6uhTRJpQl8xWPH5VHo9fdTI79h7g64/+g7unLaZE10U6ZsQSCNXt26969GkCMNnduwPjgGfNrLa+Y+kz0tBsopnlm1l+cXFxDOWKSH1cOLgzb95+Dlef3ounZ69h1P3vMmPxxrDLkiYQSyAUAT2i7ncn2CUU5TpgKoC7zwbSgKw6+uxeR58E/U1y9zx3z8vOzo6hXBGpr4y0ZH558RBevvFMMlslM/HZuXzv2Xw27tR1kVqyWAJhDpBrZn3MLIXIQeNpVdp8DlwAYGYDiQRCjf/Ou/sGYLeZnR6cXXQ18LejqF9EGtHwnu159Zaz+OmYAcxaVszI+9/l2dlrdF2kFqrOQHD3MuBmYDqwlMjZRIvN7B4z+1rQ7A7gBjNbALwAXBscLMbM1gD3A9eaWVHUGUo3Ak8ChcBK4PWGe1oi0lCSExP4wXn9mHHbOQzv2Y5f/G0x3/zDh3y6cVfYpUkDs3j6MEpeXp7n5+eHXYbIMcvdeWX+Ov7j70vZ9cUBJp5zHD+8IJe05MSwS5NamNlcd8+rq50+qSwiMTMzvj48cl2kS4Z349FZK7nwd+9RsE5f3dkSKBBE5Ii1b5PC/3zrRJ6
jQOlFVw7Z/msH7HF2GXJfWkQBCRo3Zmvyye/u6p7DtQzg3P5LO3VJ9ZiGcKBBGpl9xOGTw8YThLN+zijqkLdAZSHFMgiEi9jTg+h38dN5DXCzby4Nsrwi5HjlJS2AWISMtw3Vl9WLZxNw++vYLcTul89YSuYZckR0gjBBFpEGbGr74+hFN6t+eOqQtYWKSL48UbBYKINJjUpEQeu/JkstJTueGZfDbt0qUu4okCQUQaVFZ6Kk9ek8fufWVMfCaffQfKwy5JYqRAEJEGN7BLWx4cP5yF63byk5cW6us544QCQUQaxahBnfjJhQN4dcF6HplZGHY5EgOdZSQijebGc/uyYlMJ/zNjOf1y0hkzpEvYJUktNEIQkUZjZvznN4YyvGc7bntxAYvX65pHzZkCQUQaVVpyIo9fdTLtWidzw9P5bN6tM4+aKwWCiDS6nIw0nrg6j+17D/C9Z+fqzKNmSoEgIk1iSLdM7
sROZ9voN/fXmRzjxqhhQIItJkxg7twu2j+vPyvHU8/t6qsMuRKnSWkYg0qVvO78eKzSXc+8an9M1OZ9SgTmGXJAGNEESkSZkZ9116AkO7ZXLrlHn6buZmRIEgIk0uLTmRJ67OIz0tieufzmdryf6wSxIUCCISkk5t05h0VR7Fu/fz/f+dS2lZRdglHfNiCgQzG2Nmy8ys0MzurObxnmY208zmmdlCMxsX9djPg+WWmdmFUfPXmNkiM5tvZvkN83REJJ6c2KMd933rROas2c5dr+jMo7DVeVDZzBKBR4BRQBEwx8ymufuSqGZ3AVPd/TEzGwS8BvQOpscDg4GuwFtm1t/dK09CHuHuWxrw+YhInPnaiV0p3LSbh94ppH+nDK4/+7iwSzpmxTJCOBUodPdV7l4KTAEurtLGg
BdCawPpi+GJji7vvdfTVQGPQnInLQrSP7M3ZIZ37z2lJmLtscdjnHrFgCoRuwNup+UTAv2t3AlWZWRGR0cEsMyzoww8zmmtnEI6xbRFqQhATjt5edyPGd2/LD5+exYtPusEs6JsUSCFbNvKo7+iYAk929OzAOeNbMEupY9ivufhIwFrjJzM6pduVmE80s38zyi4uLYyhXROJR65Qknrwmj9TkRK5/Jp/te0rDLumYE0sgFAE9ou5359AuoUrXAVMB3H02kAZk1basu1f+3Az8lRp2Jbn7JHfPc/e87OzsGMoVkXjVtV0rJl19Mht27uPG5+ZyoFxnHjWlWAJhDpBrZn3MLIXIQeJpVdp8DlwAYGYDiQRCcdBuvJmlmlkfIBf42MzamFlG0L4NMBooaIgnJCLx7aSe7bn3m0P556pt/Pu0xTrzqAnVeZaRu5eZ2c3AdCAReMrdF5vZPUC+u08D7gCeMLPbiOwSutYjv8XFZjYVWAKUATe5e7mZdQL+amaVNTzv7m80xhMUkfjz9eHdWb6phMdmrWRApwyuObN32CUdEyye0jcvL8/z8/WRBZFjQUWFM/HZucxctpnJ3zmFs3O1y/homdlcd8+rq50+qSwizVJCgvG78cPIzUnnpuc+YVVxSdgltXgKBBFpttJTk3ji6jySExO4/ul8du49EHZJLZoCQUSatR4dWvOHq05m7fa93PT8J5TpzKNGo0AQkWbvlN4d+PXXh/JB4RZ+9X9Lwy6nxdIX5IhIXLgsrwcrNu3mifdXk9spnStO6xV2SS2ORggiEjfuHDuQ8wZk8+9/W8yHK3VdzIamQBCRuJGYYDw0YTi9s9rwg+c+Yc2WPWGX1KIoEEQk
RNS+aP10ROq
+mXx27dOZRw1FgSAicadXxzY8dsXJrNmyhx++MI/yivj5gG1zpkAQkbh0Rt+O3HPxEGYtK+Y/X9OZRw1BZxmJSNz69mk9Wb5pN09+EDnz6PJTeoZdUlzTCEFE4tpd/zKQs3OzuOuVAj5evS3scuKaAkFE4lpSYgK/n3ASPdq35vv/O5e12/aGXVLcUiCISNzLbJ3Mk9fkUVZewfVP51OyvyzskuKSAkFEWoTjstN59IqTKSwu4dYpOvPoaCgQRKTFOCs3i3+/aBBvLd3MfdOXhV1O3NFZRiLSolx1ei+WbdzNH95dSf9O6XzjpO5hlxQ3NEIQkRbFzLj7a4M547iO3PmXRcz9bHvYJcUNBYKItDjJiQk8esVJdGmXxveezWfdji/CLikuKBBEpEVq3yaFP16Tx/4DFdzwdD57S3XmUV1iCgQzG2Nmy8ys0MzurObxnmY208zmmdlCMxsX9djPg+WWmdmFsfYpIlJf/XIyeOjbw/l04y5uf3EBFTrzqFZ1BoKZJQKPAGOBQcAEMxtUpdldwFR3Hw6MBx4Nlh0U3B8MjAEeNbPEGPsUEam3EQNy+NdxA3lj8UYeeGt52OU0a7GcZXQqUOjuqwDMbApwMbAkqo0DbYPpTGB9MH0xMMXd9wOrzaww6I8Y+hQRaRDXndWHFZtKePidQvrlpHPxsG5hl9QsxbLLqBuwNup+UTAv2t3AlWZWBLwG3FLHsrH0KSLSIMyM/7hkCKf27sBPX1rI/LU7wi6pWYolEKyaeVV3xE0AJrt7d2Ac8KyZJdSybCx9RlZuNtHM8s0sv7i4OIZyRUS+LCUpgceuPInsjFQmPpPPxp37wi6p2YklEIqAHlH3u3Nol1Cl64CpAO4+G0gDsmpZNpY+Cfqb5O557p6XnZ0dQ7kiItXrmJ7KH685hT37y7jhmXy+KC0Pu6RmJZZAmAPkmlkfM0shcpB4WpU2nwMXAJjZQCKBUBy0G29mqWbWB8gFPo6xTxGRBjegcwYPjh9Owfqd/PilBbjrzKNKdQaCu5cBNwPTgaVEziZabGb3mNnXgmZ3ADeY2QLgBeBaj1hMZOSwBHgDuMndy2vqs6GfnIhIdUYO6sTPxhzP/y3cwENvF4ZdTrNh8ZSOeXl5np+fH3YZItICuDt3/HkBL3+yjkevOIlxQ7uEXVKjMbO57p5XVzt9UllEjklmxm++PpSTe
j9qnzKVi3M+ySQqdAEJFjVlpyIo9flUeH1inc8Ew+m3cd22ceKRBE5JiWnZHKE9fksWPvASY+O5d9B47dM48UCCJyzBvcNZMHLh/G/LU7uPMvC4/ZM48UCCIiwJghnfnx6P68Mn89j85aGXY5odA3pomIBG4a0Y/lm0q4
oycnPSGT24c9glNSmNEEREAmbGf196Aid2z+TWF+ezfNPusEtqUgoEEZEoacmJTLo6j5SkBH7z2tKwy2lSCgQRkSo6tU3je+f0Zday4mPqO5kVCCIi1bjmzF5kpafwwJvHzpfqKBBERKrROiWJ75
lw8Kt/DRqq1hl9MkFAgiIjW48vRe5GSk8ts3lx8Tn01QIIiI1CAtOZGbRvTj49Xb+Edhyx8lKBBERGox/tQedM1M4/43l7X4UYICQUSkFqlJidx8fi6ffL6DWctb9tf4KhBEROpw6cnd6d6+FQ+08GMJCgQRkTqkJCXwwwtyWVi0kzeXbAq7nEajQBARicE3hnejT1YbHnhrBRUVLXOUoEAQEYlBUmICP7ogl6UbdvHG4o1hl9MoFAgiIjG66MSu9MtJ54E3l1PeAkcJCgQRkRglJhi3jsxlxeYS
5wfdjlNLiYAsHMxpjZMjMrNLM7q3n8ATObH9yWm9mOqMfuNbOC4HZ51PzJZrY6arlhDfOUREQaz7ghXTi+cwYPvrWCsvKKsMtpUHUGgpklAo8AY4FBwAQzGxTdxt1vc/dh7j4MeBh4OVj2X4CTgGHAacBPzKxt1KI/qVzO3ec3yDMSEWlECQnGbaP6s2rLHl6Z37JGCbGMEE4FCt19lbuXAlOAi2tpPwF4IZgeBLzr7mXuvgdYAIypT8EiImEbPagTQ7q15aG3V3CgBY0SYgmEbsDaqPtFwbwvMbNeQB/gnWDWAmCsmbU2syxgBNAjapFfm9nCYJdT6hFXLyISAjPj9lH9+XzbXl6aWxR2OQ0mlkCwaubVdHh9PPCSu5cDuPsM4DXgQyKjhtlAWdD258DxwClAB+Bn1a7cbKKZ5ZtZfnFxy/7YuIjEjxEDchjWox0Pv72C/WXlYZfTIGIJhCIO/6++O1DTjrPxHNpdBIC7/zo4RjCKSLisCOZv8Ij9wJ+I7Jr6Enef5O557p6XnZ0dQ7kiIo3PzLhjdH/W79zH1Dlr614gDsQSCHOAXDPrY2YpRN70p1VtZGYDgPZERgGV8xLNrGMwfQJwAjAjuN8l+GnAJUBB/Z6KiEjTOqtfFqf0bs/vZxay70D8jxLqDAR3LwNuBqYDS4Gp7r7YzO4xs69FNZ0ATPHD
yUDLxvZkuAScCVQX8Az5nZImARkAX8qv5PR0Sk6USOJQxg0679PPfR52GXU28WT1fuy8vL8/z8/LDLEBE5zLef+CfLN+3mvZ+OoHVKUtjlfImZzXX3vLra6ZPKIiL1dPuo/mwpKeXZ2Z+FXUq9KBBEROopr3cHzumfzR/eXUnJ
K6F2imFAgiIg3g9lH92b73AJP/sTrsUo6aAkFEpAEM69GOkQNzmPTeKnZ+cSDsco6KAkFEpIHcOrI/u/aV8dQH8TlKUCCIiDSQId0yGTO4M099sJode0vDLueIKRBERBrQbaP6U1JaxqT3VoVdyhFTIIiINKABnTP46gldmfzhGraW7A+7nCOiQBARaWA/uiCXfQfKeTzORgkKBBGRBtYvJ51LhnXjmdlr2Lx7X9jlxEyBICLSCH54QS4Hyp1HZ64Mu5SYKRBERBpB76w2XHpSd57/6HM27Pwi7HJiokAQEWkkN5/fD8d5ZGZh2KXERIEgItJIenRozWV5PXhxzlqKtu8Nu5w6KRBERBrRzef3w8x4+O3mP0pQIIiINKIuma349qk9eemTItZs2RN2ObVSIIiINLIfjOhLcqLx0Dsrwi6lVgoEEZFGlpORxlWn9+KVeeso3FwSdjk1UiCIiDSB75
l7TkRB58u/mOEhQIIiJNoGN6Ktee2Zu/L1zPso27wy6nWjEFgpmNMbNlZlZoZndW8/gDZjY/uC03sx1Rj91rZgXB7fKo+X3M7CMzW2FmL5pZSsM8JRGR5mniOcfRJiWJB95cHnYp1aozEMwsEXgEGAsMAiaY2aDoNu5+m7sPc/dhwMPAy8Gy/wKcBAwDTgN+YmZtg8XuBR5w91xgO3BdwzwlEZHmqV3rFL57Vh/eWLyRgnU7wy7nS2IZIZwKFLr7KncvBaYAF9fSfgLwQjA9CHjX3cvcfQ+wABhjZgacD7wUtHsauORonoCISDy57qw+tE1L4ncRiE6nAAANSklEQVRvNb9RQiyB0A1YG3W/KJj3JWbWC+gDvBPMWgCMNbPWZpYFjAB6AB2BHe5eVlefIiItSWarZCaecxxvLd3M/LU76l6gCcUSCFbNPK+h7XjgJXcvB3D3GcBrwIdERg2zgbIj6dPMJppZvpnlFxcXx1CuiEjzdu1X+tC+dTL3N7NjCbEEQhGR/+ordQfW19B2PId2FwHg7r8Oji+MIhIEK4AtQDszS6qrT3ef5O557p6XnZ0dQ7kiIs1bemoS3zu3L+8tLyZ/zbawyzkolkCYA+QGZwWlEHnTn1a1kZkNANoTGQVUzks0s47B9AnACcAMd3dgJnBp0PQa4G/1eSIiIvHk6jN6kZWe0qxGCXUGQrCf/2ZgOrAUmOrui83sHjP7WlTTCcCU4M2+UjLwvpktASYBV0YdN/gZcLuZFRI5pvDH+j8dEZH40DoliRvP68eHK7cye+XWsMsBwA5
27e8vLyPD8/P+wyREQaxL4D5Zx730x6dmjN1O+dQeQEzIZnZnPdPa+udvqksohISNKSE7lpRD/mrNnOB4Vbwi5HgSAiEqbLT+lB18w0fjtjOWHvsVEgiIiEKDUpkVsuyGX+2h3MXLY51FoUCCIiIbv05O706NCK+98Md5SgQBARCVlyYgI/PD+XgnW7mLFkU2h1KBBERJqBrw/vxnFZbXjgzeVUVIQzSlAgiIg0A0mJCfxoZC6fbtzNawUbQqlBgSAi0kx89YSu5Oak87u3VlAewihBgSAi0kwkJhi3juxP4eYSXl1Q0yXjGo8CQUSkGRk7pDPHd87gwbdXUFZe0aTrViCIiDQjCQnG7aP6s3rLHl6et65p192kaxMRkTqNGtSJod0yeejtFZSWNd0oQYEgItLMmEVGCUXbv+CluUVNtl4FgohIM3TegGyG92zH799Zwf6y8iZZpwJBRKQZMjPuGDWA9Tv3MeXjtXUv0AAUCCIizdRX+nXk1D4deGRmIfsONP4oQYEgItJMVR5LSEowVm/Z0+jrS6q7iYiIhOX04zry7k9HkJzY+P+/a4QgItLMNUUYgAJBREQCCgQREQFiDAQzG2Nmy8ys0MzurObxB8xsfnB
mY7oh77bzN
GZLzewhM7Ng/qygz8rlchruaYmIyJGq86CymSUCjwCjgCJgjplNc/cllW3c
ao9rcAw4PpM4GvACcED38AnAvMCu5f4e759X8aIiJSX7GMEE4FCt19lbuXAlOAi2tpPwF4IZh2IA1IAVKBZCC874cTEZEaxRII3YDoj8kVBfO+xMx6AX2AdwDcfTYwE9gQ3Ka7+9KoRf4U7C76ReWuJBERCUcsgVDdG3VNX+UzHnjJ3csBzKwfMBDoTiREzjezc4K2V7j7UODs4HZVtSs3m2hm+WaWX1xcHEO5IiJyNGL5YFoR0CPqfnegpq/yGQ/cFHX/68A/3b0EwMxeB04H3nP3dQDuvtvMnieya+qZqh26+yRgUrB8sZl9FkPN1ckEdh7lsg2hsdbfEP3Wp48jXTbW9rG0q6tNFrAlx
ii
lxumjJW/LvWJq5e613oiExioiu4JSgAXA4GraDQDWABY173LgraCPZOBt4KLgflbQJhl4Cfh+XbXU5wZMasz+w1p/Q/Rbnz6OdNlY28fSrq42QH6Yv/PGuml
pw+tC173buM3L0MuBmYDiwFp
7YjO7x8y+FtV0AjDFg+oDLwErgUVBkCxw91eJHGCebmYLgfnAOuCJumqpp1cbuf+w1t8Q/danjyNdNtb2sbQL+3calrCft7blI2sfN9uyHf7+LdJymFm+u+eFXYdIfTXVtqxPKktLNinsAkQaSJNsyxohiIgIoBGCiIgEFAgiIgIoEEREJKBAkGOGmR1nZn80s5fCrkWkPszsEjN7wsz+ZmajG6pfBYLENTN7ysw2m1lBlflfumS7Ry7QeF04lYrU7gi35Vfc/QbgWiIfAG4QCgSJd5OBMdEzoi7ZPhYYBEwws0FNX5rIEZnMkW/LdwWPNwgFgsQ1d38P2FZl9pFesl0kdEeyLVvEvcDr7v5JQ9WgQJCWqNpLtptZRzP7AzDczH4eTmkiR6Smrx+4BRgJXGpm32+olcVytVOReFPtJdvdfSvQYH88Ik2gpm35IeChhl6ZRgjSEh3JJdtFmrMm3ZYVCNISzQFyzayPmaUQ+Z6OaSHXJHI0mnRbViBIXDOzF4DZwAAzKzKz62q6ZHuYdYrUpTlsy7q4nYiIABohiIhIQIEgIiKAAkFERAIKBBERARQIIiISUCCIiAigQIh7ZuZm9mzU/SQzKzazv9ex3DAzG1fL43lmVq+PxptZtpl9ZGbzzOzs+vTV0MzsHjMbGXYdtTGzyWZ2aROs51tmttTMZlaZ37XyuyPq2l6OYp3tzOwH1a1LwqNAiH97gCFm1iq4PwpYF8Nyw4Bq/8DNLMnd8939h/Ws7QLgU3cf7u7vx7JAcLnfBmFmNV6ry93/zd3faqh1NTdH+DpeB/zA3UdEz3T39e5eGUg1bi+11FDbtdLaAQcDocq6JCzurlsc34AS4DfApcH9Z4CfAX8P7rcBniLyEfh5RC4DnQJ8DhQD84l8wcbdwCRgBvA8cF5UH+nAn4BFwELgm0Aikeu3FwTzb6tS17Aq62gFTAjaFgD3VnkO9wAfAWdFzR8IfBx1vzewMJj+t+A5FQR1V37IclbwerwL/DuwGkgOHmsLrAGSg9orX7M1wC+BT4L6jg/mZwNvBvMfBz4Dsmr4HfwaWAD8E+gUzD+4jsp2wc/zgvqmAsuB/wKuAD4O1t83avk/AO8H7b4azE8E7gue/0Lge1H9zgx+f0uqqfNLr3/wOpYAy4D7qrTvHbStbnv50nYVLHMt8GfgVeAdItvO21GvbWW7KcAXQX/3Va4reCyNQ9vbPGBEVN8vA28AK4D/jno9JlPDtqjbEbyfhF2AbvX8BUb+mE8AXgr+kOZz+Jv5b4Arg+l2wRtLm+CP6/dR/dwNzAVaBfej+7gX+F1U2
AycCbUfPaVVPbwXUAXYM3lWwiV9l9B7gkeMyBy2p4fvOB44LpnwF3BdMdoto8C1wUTM8CHo167E9R65kI/DaYnszhgXBLMP0D4Mlg+vfAz4PpMUGd1QWCR63/v6NqPLiOyt9V1Gu7A+gCpBIZ0f0yeOxHla91sPwbREbyuUQudJYWPI/KdaQC+UCfoN89QJ9qaqzt9Z8F5FWzTG8OvUkf/F3GsF0VVf5+gnW1DaazgEIiV/A82Hc167oD+FMwfXxQd1rQ9yogM7j/GZELv9W5LeoW2027jFoAd19I5A9qAvBalYdHA3ea2Xwif/hpQM8auprm7l9UM38kUd/K5O7bifxhHmdmD5vZGGBXHWWeAsxy92KPXJ/lOeCc4LFy4C81LDcVuCyYvhx4MZgeERyfWAScDwyOWubFqOknge8E098hEhDVeTn4OZfIawlwFpH/ZHH3N4DtNSxbClQes4levjZz3H2Du+8HVhIZmUHkP9zo5ae6e4W7ryDymh9P5Hd6dfA7/QjoSCQwIDKiWl3N+mp7/Y9GbdvVm+5e+UUvBvzGzBYCbxG5ln+nOvo+i0jI4+6fEnnj7x889ra773T3fcASoBdHvi1KDfR9CC3HNOB/iPyX2DFqvgHfdPdl0Y3N7LRq+thTQ99G5L/gg9x9u5mdCFwI3ETkTfu7tdRX3XXdK+1z9/IaHnsR+LOZvRxZra8wszTgUSL/1a41s7uJvCF96Xm4+z/MrLeZnQskuvth31cbZX/ws5xDfxe11RztgAf/mlZZvozgOJ2ZGZFdL1XXB1ARdb+Cw/8uq15szIO6bnH36dEPmNl51P47bEi1bVfRNVxBZFRysrsfMLM1HP67qqnvmkS
uVA0lFsi1IDjRBajqeAe9x9UZX504FbgjckzGx4MH83kBFj3zOIXHGRoI/2ZpYFJLj7X4BfACfV0cdHwLlmlhUc8JxAZD96rdx9JZE
F9w6D
yjeULWaWDtR1MPIZ4AVqHh3U5AOC0YmZjSayq+xIrCGyOwMix26Sj3B5gG+ZWYKZ9QWOI7Kvfzpwo5klB7X1N7M2dfRzVK9/lK
S03bVVWZwOYgDEYQ+Y++uv6ivUckSDCz/kRGHstqaMtRbItSAwVCC+HuRe7+YDUP/QeRN6KFZlYQ3IfIwcdBZjbfzC6vo/tfAe3NrMDMFgAjiAz9ZwW7DCYDtX4lpbtvCNrMJHLw9RN3/1tsz44XgSuJ7D7C3XcATxDZvfIKkQObtXmOyJv5CzGur9IvgdFm9gmRLznfQOSNLFZPEHkT/hio+p9zrJYReeN+Hfh+sKvkSSK7Sz4JfqePU8dov56vP3x5e6lpu6rqOSDPzPKJvMl/GtSzFfhHsE3dV2WZR4HEYHfgi8C1wa61mhzRtig10+WvpcULzuW/2N2vOsLlUoFydy8zszOAx9x9WKMUKdIM6BiCtGhm9jCR/+6P5kNVPYGpZpZA5MDxDQ1Zm0hzoxGCiIgAOoYgIiIBBYKIiAAKBBERCSgQREQEUCCIiEhAgSAiIgD8f/7NRojchwi/AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(trainingData, {},'leastAbsoluteE
or', param,3, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[1.3129641027539716, 0.8623427672994901, 0.824921416579645, 0.7999850600613403, 0.8169310667181966, 0.8169291195629018, 0.8169291195629018]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAG7RJREFUeJzt3Xl0XOV5x/HvI81II294kSBgSxiIE+oQMESmtIHghIQa2rKENI5PSEiAuj0lS3OatuT0FChpS2mTNjvUocYJTUlcAq2TkgAhEE4bQixjY5vdCWAJQ6x4AYxk2ZKe/nHfkUaj2WyNPLpXv885Op7lzp336lo/vXrmve9r7o6IiCRLXa0bICIi1adwFxFJIIW7iEgCKdxFRBJI4S4ikkAKdxGRBFK4i4gkkMJdRCSBFO4iIgmUqtUbNzc3+/z582v19iIisbR+/fpfu3tLue1qFu7z58+no6OjVm8vIhJLZvZCJdupLCMikkAKdxGRBFK4i4gkkMJdRCSBFO4iIgmkcBcRSSCFu4hIAsUu3J9++TU+f+/T7NzbV+umiIhMWLEL91907+XLP95Kt8JdRKSo2IV7Jh01uXf/QI1bIiIycZUNdzNbZWY7zGxLkecvNLNNZ
RzDrM7MzqN3NYJlUPwL4Dg+P5NiIisVZJz301sLTE8/cDp7j7IuBy4JYqtKuoTEMI93713EVEiikb7u7+ELCrxPN73d3D3amAF9u2GrI9974DCncRkWKqUnM3s4vN7Cngf4h678W2WxFKNx3d3d2H9F7ZmrvKMiIixVUl3N39Lnc/EbgI+GyJ7Va6e7u7t7e0lJ2OuKBMOltzV89dRKSYqo6WCSWcE8ysuZr7zaVwFxEpb8zhbmZvNDMLt08DGoCdY91vMUNDIVWWEREpquxKTGZ2O7AEaDazLuBaIA3g7jcDlwAfNrMDQC+wLOcD1qobHgqpnruISDFlw93dl5d5/kbgxqq1qIy6OqMhVaehkCIiJcTuClWATKqOPpVlRESKime4p+tVlhERKUHhLiKSQDEN9zp6Fe4iIkXFNNzrdYWqiEgJMQ539dxFRIqJb7j3q+cuIlJMPMM9VadZIUVESohnuKssIyJSUkzDXaNlRERKiWm4a7SMiEgpsQz3JpVlRERKimW4N6
6esfZBwnnxQRibVYhnt2Tvc+DYcUESkonuGuOd1FREqKZ7iHpfY0YkZEpLCYhnvUbI2YEREpLKbh
KMiEgpsQz3JoW7iEhJsQz3RpVlRERKimW4D5VltEi2iEhB8Qz3MBRSM0OKiBQWz3APZRkNhRQRKSym4Z79QFU1dxGRQmIZ7hotIyJSWizDXT13EZHSYhnujansUEj13EVEColluNfVGQ2pOg2FFBEpIpbhDtEi2fv2K9xFRAopG+5mtsrMdpjZliLPf9DMNoWvn5rZKdVv5mhaak9EpLhKeu6rgaUlnn8OONvdTwY+C6ysQrvKamqoV1lGRKSIVLkN3P0hM5tf4vmf5tz9GTBv7M0qL5PSOqoiIsVUu+Z+BfCDKu+zoEy6TmUZEZEiyvbcK2Vm7yQK9zNLbLMCWAHQ1tY2pvdrTKvnLiJSTFV67mZ2MnALcKG77yy2nbuvdPd2d29vaWkZ03tmFO4iIkWNOdzNrA24E/iQuz8z9iZVJpNSWUZEpJiyZRkzux1YAjSbWRdwLZAGcPebgWuAOcDXzAyg393bx6vBWZm0RsuIiBRTyWiZ5WWevxK4smotqlCTyjIiIkXF9wpVjZYRESkqxuGunruISDGxDffGdD19/YMMDnqtmyIiMuHENtyzS+319as0IyKSL77hntJqTCIixcQ23JsaQrhrOKSIyCixDfdsWUYjZkRERotvuKssIyJSVHzDPa1wFxEpJ
h3hjKMr0KdxGRUWIb7tmee59q7iIio8Q23JtUlhERKSq24T5Uc9dQSBGRUWIc7hoKKSJSTHzDXUMhRUSKim+4h7KMRsuIiIwW23BvTKksIyJSTGzDva7OaEjV0aeeu4jIKLENd9BSeyIixcQ63LXUnohIYTEP93qNcxcRKSDe4Z6qp3e/wl1EJF+8wz1dxz4tsyciMkqsw71RH6iKiBQU63BvStdrKKSISAGxDneNlhERKSzm4a7RMiIihcQ73DVaRkSkoHiHe7pOH6iKiBQQ83Cv11BIEZECyoa7ma0ysx1mtqXI8yea2cNm1mdmn65+E4trTNezv3+QwUE/nG8rIjLhVdJzXw0sLfH8LuATwOeq0aCDkV1HtU+9dxGREcqGu7s/RBTgxZ7f4e7rgAPVbFglhpfaU91dRCTXYa25m9kKM+sws47u7u4x70+LZIuIFHZYw93dV7p7u7u3t7S0jHl/2Z67hkOKiIwU79EyQ4tkq+YuIpIr3uGusoyISEGpchuY2e3AEqDZzLqAa4E0gLvfbGZvADqAGcCgmf0psNDdXx23VgdD4a4PVEVERigb7u6+vMzzLwPzqtaig5CtufepLCMiMkIyyjLquYuIjJCIcO9VuIuIjBDzcM9exKSyjIhIrniHe0plGRGRQmId7k0NGgopIlJIrMO9MaWyjIhIIbEOdzOjMVWnRbJFRPLEOtwhGjGj0TIiIiMlINy11J6ISL4EhHu9au4iInniH+6pevXcRUTyxD/cG7RItohIvviHe0o1dxGRfPEP97TKMiIi+RIQ7uq5i4jkS0C4a7SMiEi++Ie7RsuIiIwS+3BvalC4i4jki324N6
NBRSRCRP7MM9k6pnf/8gg4Ne66aIiEwY8Q/3tOZ0FxHJl4Bw15zuIiL5EhDuWmpPRCRf7MO9SeEuIjJK7MNdZRkRkdFiH+6N+kBVRGSU2Id7JhXCf
CXUQkK/7hni3LqOcuIjIkAeGe/UBVNXcRkayy4W5mq8xsh5ltKfK8mdmXzGyrmW0ys9Oq38ziNBRSRGS0Snruq4GlJZ4/D1gQvlYAN429WZVrUs9dRGSUsuHu7g8Bu0psciHwTY/8DJhpZkdXq4HlDA+FVM9dRCSrGjX3uUBnzv2u8NhhkS3L9CrcRUSGVCPcrcBjBadoNLMVZtZhZh3d3d1VeGtoTEWH0KdwFxEZUo1w7wJac+7PA7YX2tDdV7p7u7u3t7S0VOGtwcxoTGlOdxGRXNUI97XAh8OomTOAV9z9pSrst2LROqrquYuIZKXKbWBmtwNLgGYz6wKuBdIA7n4zcDdwPrAV6AE+Ol6NLaZJ4S4iMkLZcHf35WWed+CqqrXoEGTSdRoKKSKSI/ZXqEJUltFoGRGRYYkI90aVZURERkhEuGdSdfSpLCMiMiQZ4Z6u16yQIiI5EhHuGi0jIjJSIsJdo2VEREZKSLir5y4ikisx4a6hkCIiwxIR7o1pjZYREcmViHDPpOrZPzDIwGDByShFRCadZIR7mNO9T8MhRUSAhIR709BqTCrNiIhAQsJdi2SLiIyUqHDXiBkRkUhCwl2LZIuI5EpEuDcOlWVUcxcRgYSEeyYVRsuo5y4iAiQk3JsaQs9dQyFFRICEhHtGQyFFREZIRriHskzvfvXcRUQgKeGeVllGRCRXQsJdZRkRkVwJCXddoSoikisR4d6Yig5DQyFFRCKJCHczi5ba61dZRkQEEhLuEFZj0mgZEREgSeGe0jqqIiJZyQl3lWVERIYkKNzVcxcRyUpMuDcq3EVEhlQU7ma21MyeNrOtZnZ1geePNbP7zWyTmT1oZvOq39TSmtJ19OkiJhERoIJwN7N64KvAecBCYLmZLczb7HPAN939ZOB64IZqN7ScTLpe0w+IiASV9NxPB7a6+y/dfT/wbeDCvG0WAveH2w8UeH7cZVIaCikiklVJuM8FOnPud4XHcj0GXBJuXwxMN7M5+TsysxVm1mFmHd3d3YfS3qKi0TIKdxERqCzcrcBjnnf/08DZZrYBOBt4Eegf9SL3le7e7u7tLS0tB93YUqLRMqq5i4gApCrYpgtozbk/D9ieu4G7bwfeC2Bm04BL3P2VajWyEhoKKSIyrJKe+zpggZkdZ2YNwAeAtbkbmFmzmWX39RlgVXWbWV4mXa/RMiIiQdlwd/d+4GPAPcCTwBp3f9zMrjezC8JmS4CnzewZ4Cjg78apvUVl0nXsHxhkYDC/YiQiMvlUUpbB3e8G7s577Jqc23cAd1S3aQcnd073qY0VHZaISGIl5grVTCq7GpPq7iIiyQn3oXVUVXcXEUleuKvnLiKSpHBXWUZEJCtB4Z7tuassIyKSwHBXz11EROEuIpJACQr3bM1dZRkRkcSEe1Pouf/q1X01bomISO0lJtznzZrCKfOO4J/ueZotLx7WOctERCacxIR7fZ3x9cvamT21gctXr2P7nt5aN0lEpGYSE+4AR07PsOoji+ndP8Dlq9fx2r4DtW6SiEhNJCrcAd78hul87dLT2LpjL1f9xwb6B/QBq4hMPokLd4CzFrTwtxedxEPPdHPN2sdx1zTAIjK5JHZu3A+c3sYLu3q46cFfcOzsKfzR2SfUukkiIodNYsMd4M/PfTPbdvVwww+eonX2FM5/69G1bpKIyGGRyLJMVl2d8fk/OIXT2mbyqe9s5NFtu2vdJBGRwyLR4Q7RtARf/3A7R83I8Iff6KBzV0+tmyQiMu4SH+4Ac6Y1cutHF9M/6Hzk1p/zSo+GSIpIsk2KcAc4oWUa
qht7FtVw9
O
2a8Vm0QkwSZNuAOccfwc/vF9J/PwL3fymTs3a4ikiCRWokfLFHLxqfN4YWcPX/jRs8yfM4WPn7Og1k0SEam6SRfuAJ88ZwHbdvbw+fueYUZTmuWnt9GQmlR/xIhIwk3KcDcz
jk
z86j6uXfs4X7z/WS4+dS7LFrfypqOm17p5IiJjZrWqO7e3t3tHR0dN3jtrYNB56Nlu1qzr5EdP/ooDA86i1pksW9zK759yDNMaJ+XvPhGZwMxsvbu3l91uMod7rp17+7hrw4t8Z10nz+7YS1O6nt89+WiWLW6l/dhZmFmtm3jYuTvd
XRubuHF/fs0yRsIlXypqOmc9LcIw7ptZWGu7qmwZxpjVx51vFcceZxbOjcw5p1nXzvse3csb6L41um8v72Vt572lyOnJ6pdVOr6vW+fjp397BtZw
dvXQtbuX
t66NzVQ+fuHi1bKDIO/vjsEw453CulnnsJ
f18z+bX2LNuk46XthNfZ3xrhOPZFl7K0ve3EKqfuJ/CNs/MMhLr+wbCuttu3rYtqs3ur+rh52v7x+x
TGFK2zp9A2u4m22VNoDV/zZjbpQ2eRKpmRSTNrasMhvVZlmS
umMv/9nRyXcf7eLXe/dz5PRGLnnbPN7f3spxzVNr1i53Z0/Pgai3HcI7Cu6oB759Ty/9g8PnOFVnHDMzN7ij222zp9A6awozp6QnZQlKJC6qGu5mthT4IlAP3OLu/5D3fBvwDWBm2OZqd7+71D7jFu5ZBwYG+fFTO1izrpMHnt7BoMPpx83mA4tbOe+ko2lqqK/6e+47MEDX7l46d0fBvW1nNsh76drVw2t9/SO2nzO1IfS+h8O7dVYU5kcfkYnFXxwiUljVwt3M6oFngPcAXcA6YLm7P5GzzUpgg7vfZGYLgbvdfX6p/cY13HP96tV93LG+izUdnbyws4fpjSkuWHQMyxa38ta5R1TcAx4cdHaEDy6Hg3u4B/7yq/tGbN+YqhvqeQ/1wGc10TYnCvGpGuUjkljV/ED1dGCru/8y7PjbwIXAEznbODAj3D4C2H5wzY2no2ZkuOqdb+RPlpzAI8/tYs26qGzzrUe2ceIbprNscSsXLZrLrKkNvLbvwFCppGuo9p394LJ3xFw3ZnD0jAzzZk/hzAXNo3rgLdMbVToRkZIq6bm/D1jq7leG+x8CftPdP5azzdHAvcAsYCrwbndfX2q/Sei5F/LqvgOs3bidNR2dbOp6hYb6OqY21rM7bybK6ZnUcK07txc+q4m5s5poTFW/vCMi8VfNnnuhLmL+b4TlwGp3/7yZ/RZwm5md5O4jxtGZ2QpgBUBbW1sFbx0/MzJpLj3jWC4941ie2P4qdz7aRc+BgREfW
NnsIRU9K1bqqIJFgl4d4FtObcn8fosssVwFIAd3/YzDJAM7AjdyN3XwmshKjnfohtjo2Fx8xg4TELa90MEZmEKhk2sQ5YYGbHmVkD8AFgbd4224BzAMzsN4AM0F3NhoqISOXKhru79wMfA+4BngTWuPvjZna9mV0QNvsz4A/N7DHgduAjrsnSRURqpqIxc2HM+t15j12Tc/sJ4O3VbZqIiBwqXc0iIpJACncRkQRSuIuIJJDCXUQkgRTuIiIJVLMpf82sG3ihzGbNwK8PQ3MmGh335DNZj13HffCOdfeWchvVLNwrYWYdlcyhkDQ67slnsh67jnv8qCwjIpJACncRkQSa6OG+stYNqBEd9+QzWY9dxz1OJnTNXUREDs1E77mLiMghmLDhbmZLzexpM9tqZlfXuj3jxcxazewBM3vSzB43s0+Gx2eb2X1m9mz4d1at2zoezKzezDaY2ffD/ePM7JFw3N8J00wnipnNNLM7zOypcN5/azKcbzP7VPg/vsXMbjezTBLPt5mtMrMdZrYl57GC59ciXwo5t8nMTqtWOyZkuIdFub8KnAcsBJaHhbeTqB/4M3f/DeAM4KpwrFcD97v7AuD+cD+JPkk0lXTWjcC/hOPeTbQQTNJ8Efihu58InEJ0/Ik+32Y2F/gE0O7uJwH1RGtDJPF8ryYsXpSj2Pk9D1gQvlYAN1WrERMy3MlZlNvd9wPZRbkTx91fcvdHw+3XiH7Q5xId7zfCZt8ALqpNC8ePmc0Dfhe4Jdw34F3AHWGTxB23mc0A3gH8G4C773f3PUyC8000xXiTmaWAKcBLJPB8u/tDwK68h4ud3wuBb3rkZ8DMsCb1mE3UcJ8LdObc7wqPJZqZzQdOBR4BjnL3lyD6BQAcWbuWjZsvAH8BZNfanQPsCQvEQDLP+/FEq5TdGspRt5jZVBJ+vt39ReBzRKu2vQS8Aqwn+ec7q9j5Hbesm6jhXsmi3IliZtOA7wJ/6u6v1ro9483Mfg/Y4e7rcx8usGnSznsKOA24yd1PBV4nYSWYQkKN+ULgOOAYYCpRSSJf0s53OeP2f36ihnsli3InhpmliYL9W+5+Z3j4V9k/z8K/O4q9PqbeDlxgZs8Tld3eRdSTnxn+bIdknvcuoMvdHwn37yAK+6Sf73cDz7l7t7sfAO4Efpvkn++sYud33LJuooZ7JYtyJ0KoM/8b8KS7/3POU2uBy8Lty4D/PtxtG0/u/hl3n+fu84nO74/d/YPAA8D7wmZJPO6XgU4ze3N46BzgCRJ+vonKMWeY2ZTwfz573Ik+3zmKnd+1wIfDqJkzgFey5Zsxc/cJ+QWcDzwD/AL4q1q3ZxyP80yiP8M2ARvD1/lE9ef7gWfDv7Nr3dZx/B4sAb4fbh8P/BzYCvwn0Fjr9o3D8S4COsI5/y9g1mQ438DfAE8BW4DbgMYknm/gdqLPFQ4Q9cyvKHZ+icoyXw05t5loNFFV2qErVEVEEmiilmVERGQMFO4iIgmkcBcRSSCFu4hIAincRUQSSOEeU2bmZnZbzv2UmXVnZ1cs8bpFZnZ+iefbzexLY2xbS5jpb4OZnTWWfYX9zc/OsJfbPjNrNLMfmdlGM1tmZmeFWQc3mlnTWN+3RHuWmNlvj9f+i7znLbWaPC8cb8H/V2Z2t5nNPNxtkvJS5TeRCep14CQza3L3XuA9wIsVvG4R0A7cnf+EmaXcvYNoDPZYnAM85e6Xld1y+L3r3X2g3HZ57TsVSLv7orCPm4HPufutFb6nES1YM1h245GWAHuBnx7k6w6Zu195uN7rYLh70Y6C1FitB/zr65AvlNgL/D3wvnD/m8BfMnwx0FRgFdHVvhuI5vVoILpSsJvoYqllwHVES37dC/wHIy8omgbcSnRxxSbgEqKpWlcTXYiyGfhUXrsW5b1HE7A8bLsFuDHvGK4nmijtzLz9vA14DHgY+CdgS3h8CfB9oomXthJNQLUR+COimfieI5rGAeDPw/FvAv4mPDafaObNr4Xvy7HAueF9HiW6kGZa2PZ5ogtvHg3tPzG8/mWiX6QbgbPy2n0d0ax/94bXvxf4x/D6HxL9MgK4JrRtS/j+G1Fnax2wJGxzA/B34faDhAtcwvftRqKJt35ENIvqg8AvgQvCNh8BvpLTru/n7Lfs6/OOaQnwEHAX0VWlNwN1Od+j5pzv69eBx8PxN4VtPhFetwn4dq1/dibLV80boK9DPHHRD+jJRHOTZELQLGE4mP8euDTcnkl0te/UAj/014Uf8uwPYu4+bgS+kLPtLKLQvS/nsZkF2jb0HkSTRG0DWkJ4/Ri4KDznwPuLHN8m4Oxwe1S4598O91cz/MvuXIZDsy6E2ztCCA0CZ4TtmkNwTQ33/xK4Jtx+Hvh4uP0nwC0537NPF2n3dcD/Ammiudp7gPPCc3flHPvsnNfcBvx+uP2WEJLvIfrl0xAef5DhcPe8fd6b834b889BuJ8b7mVfn3dMS4B9RFeT1gP35Xyfn2c43PuBReHxNQz
9tOuPKUAv9f9DU+X6q5x5i7byL6oVrO6DLLucDVZraRKBgyQFuRXa31qLST791El0Zn3283Ue/ueDP7spktBcrNYLkYeNCjCaP6gW8RhSzAANGEaSOY2RFEIfCT8NBt+dtU4NzwtYGo530i0YIIAC94NHc2RAukLAT+L3yvLiPqzWdlJ3JbT/S9rsQPPJocazNRGP4wPL45Zx/vDJ9LbCaaNO0tAO7+ONHxfg+43KP1DPLtz9vnT3Ler5I2Hs
f+7R+goDRJfXn1lgm+fcfWO4nfv92gR8y8wuJfoFIIeBau7xt5ZonuwlRPNXZBlwibs/nbuxmf1mgX28XmTfRt70o+6+28xOAX4HuAp4P3B5ifYVmtI0a58X
OPet9DYMAN7v6vIx6M5sx/PW+7+9x9eZH99IV/B6j856UPwN0HzeyAhy4r0V8MKTPLEJWF2t2908yuI
lm/VWYA9wVJH95+8z9/2ybexn5ICJzEG+Pl/++Sh0fvpybg8QleQgWpDlHcAFwF+b2Vt8eA53GSfqucffKuB6d9+c9/g9wMfDh4aY2anh8deA6RXu+17gY9k7ZjbLzJqJ6q3fBf6aaLraUh4Bzjaz5rB84nLgJ6Ve4NHKRK+YWbZ3+MEK25vrHuDyME8+ZjbXzAotgPEz4O1m9saw3RQze1OZfR/M97CQbND+OrQvOysiZvZeol/S7wC+NIaRKM8Di8yszsxaierqY3F6mKW1juizmv+t5EVh+1Z3f4BoYZaZRJ/lyDhTuMecu3e5+xcLPPVZojrqpjCM8LPh8QeAhdnhg2V2/7fALIsWNH4MeCfRKjEPhhLGauAzZdr3UtjmAaIPSB9190qmdf0o8FUzexgoVDIqyd2zHxA/HEofd1AgkN29m6g+fbuZbSIK+xPL7P57wMXhe3jQQz3DL6+vE5VB/ovoQ1TCL85/AK5w92eArxCtt3oo/o/ow+XNRH/ZPXqI+8l6OLRtS9jvXRW+rh7493AONhCtl7pnjG2RCmhWSBGRBFLPXUQkgRTuIiIJpHAXEUkghbuISAIp3EVEEkjhLiKSQAp3EZEEUriLiCTQ/wPx770DyW9UeAAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(trainingData, {},'leastAbsoluteE
or',10,3, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemi
or_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
assign2/.ipynb_checkpoints/house-checkpoint.ipyn
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" SparkContext
p>\n",
"\n",
" a href=\"http:
192.168.2.1:4040\">Spark UI
a
p>\n",
"\n",
"
\n",
"
Version
dt>\n",
" code>v2.2.0
code
dd>\n",
"
Maste
dt>\n",
" code>local[*]
code
dd>\n",
"
AppName
dt>\n",
" code>PySparkShell
code
dd>\n",
"
dl>\n",
"
div>\n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sc"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LabeledPoint,LinearRegressionWithSGD\n",
"from pyspark.mllib.tree import DecisionTree\n",
"import numpy as np\n",
"import operator\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"\n",
"house_df = sc.textFile(\"/Users/Priya/Desktop/house/trainnoheader.csv\")\n",
"data_count= house_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"records = house_df.map(lambda x: x.split(\",\"))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"type_columns=[2,5,7,8,9,10,11,12,13,14,15,16,21,22,23,24,27,28,29,39,40,41,53,55,65,78,79]\n",
"type_columns_with_NA=[6,25,30,31,32,33,35,42,57,58,60,63,64,72,73,74]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"number_columns=[1,4,17,18,19,20,34,36,37,38,43,44,45,46,47,48,49,50,51,52,54,56,61,62,66,67,68,69,70,71,75,76,77]\n",
"number_columns_with_NA=[3,26,59]\n",
"number_columns_with_many_zeros=[26,34,36,37,38,44,45,62,66,67,68,69,70,71,75]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"saleprice_column=80"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def getMapOfColumn(idx):\n",
" return records.map(lambda fields:fields[idx]).distinct().zipWithIndex().collectAsMap()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def get_type_maps():\n",
" type_maps={}\n",
" for i in type_columns:\n",
" type_maps[i]=getMapOfColumn(i)\n",
" for i in type_columns_with_NA:\n",
" type_maps[i]=getMapOfColumn(i)\n",
" return type_maps"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"type_maps=get_type_maps()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{2: {'RL': 0, 'RH': 1, 'RM': 2, 'C (all)': 3, 'FV': 4}, 5: {'Pave': 0, 'Grvl': 1}, 7: {'Reg': 0, 'IR1': 1, 'IR2': 2, 'IR3': 3}, 8: {'Bnk': 0, 'Low': 1, 'Lvl': 2, 'HLS': 3}, 9: {'NoSeWa': 0, 'AllPub': 1}, 10: {'FR2': 0, 'CulDSac': 1, 'Inside': 2, 'Corner': 3, 'FR3': 4}, 11: {'Gtl': 0, 'Mod': 1, 'Sev': 2}, 12: {'CollgCr': 0, 'Mitchel': 1, 'NWAmes': 2, 'NAmes': 3, 'MeadowV': 4, 'Edwards': 5, 'ClearCr': 6, 'NPkVill': 7, 'Blmngtn': 8, 'SWISU': 9, 'Veenker': 10, 'Crawfor': 11, 'NoRidge': 12, 'Somerst': 13, 'OldTown': 14, 'BrkSide': 15, 'Sawyer': 16, 'NridgHt': 17, 'SawyerW': 18, 'IDOTRR': 19, 'Timber': 20, 'Gilbert': 21, 'StoneBr': 22, 'BrDale': 23, 'Blueste': 24}, 13: {'Norm': 0, 'Feedr': 1, 'PosN': 2, 'Artery': 3, 'RRAe': 4, 'RRNn': 5, 'PosA': 6, 'RRAn': 7, 'RRNe': 8}, 14: {'Norm': 0, 'Artery': 1, 'RRNn': 2, 'Feedr': 3, 'PosN': 4, 'PosA': 5, 'RRAe': 6, 'RRAn': 7}, 15: {'1Fam': 0, 'Duplex': 1, 'TwnhsE': 2, '2fmCon': 3, 'Twnhs': 4}, 16: {'1.5Fin': 0, '1.5Unf': 1, 'SLvl': 2, '2.5Unf': 3, '2.5Fin': 4, '2Story': 5, '1Story': 6, 'SFoyer': 7}, 21: {'Hip': 0, 'Shed': 1, 'Gable': 2, 'Gam
el': 3, 'Mansard': 4, 'Flat': 5}, 22: {'Metal': 0, 'Mem
an': 1, 'Roll': 2, 'CompShg': 3, 'WdShngl': 4, 'WdShake': 5, 'Tar&Grv': 6, 'ClyTile': 7}, 23: {'VinylSd': 0, 'WdShing': 1, 'Plywood': 2, 'BrkComm': 3, 'AsphShn': 4, 'CBlock': 5, 'MetalSd': 6, 'Wd Sdng': 7, 'HdBoard': 8, 'BrkFace': 9, 'CemntBd': 10, 'AsbShng': 11, 'Stucco': 12, 'Stone': 13, 'ImStucc': 14}, 24: {'VinylSd': 0, 'Wd Shng': 1, 'Plywood': 2, 'CmentBd': 3, 'AsphShn': 4, 'CBlock': 5, 'MetalSd': 6, 'HdBoard': 7, 'Wd Sdng': 8, 'BrkFace': 9, 'Stucco': 10, 'AsbShng': 11, 'Brk Cmn': 12, 'ImStucc': 13, 'Stone': 14, 'Other': 15}, 27: {'Fa': 0, 'Gd': 1, 'TA': 2, 'Ex': 3}, 28: {'Fa': 0, 'Po': 1, 'TA': 2, 'Gd': 3, 'Ex': 4}, 29: {'PConc': 0, 'CBlock': 1, 'BrkTil': 2, 'Wood': 3, 'Slab': 4, 'Stone': 5}, 39: {'GasW': 0, 'GasA': 1, 'Grav': 2, 'Wall': 3, 'OthW': 4, 'Floor': 5}, 40: {'Fa': 0, 'Po': 1, 'Ex': 2, 'Gd': 3, 'TA': 4}, 41: {'N': 0, 'Y': 1}, 53: {'Fa': 0, 'Gd': 1, 'TA': 2, 'Ex': 3}, 55: {'Typ': 0, 'Min2': 1, 'Maj2': 2, 'Min1': 3, 'Maj1': 4, 'Mod': 5, 'Sev': 6}, 65: {'N': 0, 'Y': 1, 'P': 2}, 78: {'WD': 0, 'New': 1, 'ConLw': 2, 'COD': 3, 'ConLD': 4, 'ConLI': 5, 'CWD': 6, 'Con': 7, 'Oth': 8}, 79: {'Normal': 0, 'AdjLand': 1, 'Family': 2, 'Abnorml': 3, 'Partial': 4, 'Alloca': 5}, 6: {'NA': 0, 'Pave': 1, 'Grvl': 2}, 25: {'None': 0, 'NA': 1, 'BrkFace': 2, 'Stone': 3, 'BrkCmn': 4}, 30: {'NA': 0, 'Fa': 1, 'Gd': 2, 'TA': 3, 'Ex': 4}, 31: {'NA': 0, 'Fa': 1, 'Po': 2, 'TA': 3, 'Gd': 4}, 32: {'Mn': 0, 'NA': 1, 'No': 2, 'Gd': 3, 'Av': 4}, 33: {'GLQ': 0, 'Rec': 1, 'NA': 2, 'ALQ': 3, 'Unf': 4, 'BLQ': 5, 'LwQ': 6}, 35: {'NA': 0, 'Rec': 1, 'GLQ': 2, 'Unf': 3, 'BLQ': 4, 'ALQ': 5, 'LwQ': 6}, 42: {'FuseF': 0, 'FuseA': 1, 'FuseP': 2, 'Mix': 3, 'NA': 4, 'SBrkr': 5}, 57: {'NA': 0, 'Fa': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 58: {'BuiltIn': 0, 'CarPort': 1, 'NA': 2, 'Basment': 3, '2Types': 4, 'Attchd': 5, 'Detchd': 6}, 60: {'Fin': 0, 'NA': 1, 'RFn': 2, 'Unf': 3}, 63: {'Fa': 0, 'NA': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 64: {'Fa': 0, 'NA': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 72: {'NA': 0, 'Fa': 1, 'Ex': 2, 'Gd': 3}, 73: {'NA': 0, 'MnPrv': 1, 'MnWw': 2, 'GdWo': 3, 'GdPrv': 4}, 74: {'NA': 0, 'Shed': 1, 'Othr': 2, 'Gar2': 3, 'TenC': 4}}\n"
]
}
],
"source": [
"print(type_maps)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def get_type_cnt(maps):\n",
" return sum([len(maps[i]) for i in maps])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature vector length for type features: 268\n",
"Feature vector length for numerical features: 33\n",
"Total feature vector length: 301\n",
"Total_dt feature vector length: 76\n"
]
}
],
"source": [
"type_cnt=get_type_cnt(type_maps)\n",
"number_cnt=len(number_columns)\n",
"total=type_cnt+number_cnt\n",
"\n",
"total_dt=len(type_columns)+len(type_columns_with_NA)+len(number_columns)\n",
"\n",
"print (\"Feature vector length for type features: %d\" % type_cnt)\n",
"print (\"Feature vector length for numerical features: %d\" % number_cnt)\n",
"print (\"Total feature vector length: %d\" % total)\n",
"print (\"Total_dt feature vector length: %d\" % total_dt)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def extract_features(fields):\n",
" features=np.zeros(total)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step+ int(type_maps[i][fields[i]]) ]=1.0\n",
" step=step+len(type_maps[i])\n",
" for i in type_columns_with_NA:\n",
" features[step+int(type_maps[i][fields[i]])]=1.0\n",
" step=step+len(type_maps[i])\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(fields):\n",
" features=np.zeros(total_dt)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" \n",
" for i in type_columns_with_NA:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"data=records.map(lambda fields: LabeledPoint(float(fields[saleprice_column]),extract_features(fields)))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label: 208500.0\n",
"Linear Model feature vector:\n",
"[1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]\n",
"Linear Model feature vector length: 301\n"
]
}
],
"source": [
"first_point = data.first()\n",
"#print (\"Raw data: \" + str(first_point[1:]))\n",
"print (\"Label: \" + str(first_point.label))\n",
"print (\"Linear Model feature vector:\\n\" + str(first_point.features))\n",
"print (\"Linear Model feature vector length: \" + str(len(first_point.features)))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LinearRegressionWithSGD"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"lrModel=LinearRegressionWithSGD.train(data, iterations=10, step=0.1, intercept=False)\n",
"true_vs_predicted=data.map(lambda p: (p.label, lrModel.predict(p.features)))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model predictions: [(208500.0, -1.3111060925180484e+75), (181500.0, -1.4720767452081686e+75), (223500.0, -1.7050281430818638e+75), (140000.0, -1.4631365187530982e+75), (250000.0, -2.1369709269890862e+75)]\n"
]
}
],
"source": [
"print (\"Linear Model predictions: \" + str(true_vs_predicted.take(5)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model - Mean Squared E
or: 4519283835876382689242228853370308019420839092654378420329959275965062173707428040253979807151535809132649831467864490762707227557576984928707873341440.0000\n"
]
}
],
"source": [
"li=[]\n",
"for i in true_vs_predicted.collect():\n",
" true,pred=i[0],i[1]\n",
" val=(pred - true)**2\n",
" li.append(val)\n",
"lenth=len(li)\n",
"su=sum(li)\n",
"mean=su/lenth\n",
"print (\"Linear Model - Mean Squared E
or: %2.4f\" % mean)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"targets = records.map(lambda r: float(r[-1])).collect()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import pylab"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
},
{
"name": "stde
",
"output_type": "stream",
"text": [
"/anaconda3/li
python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clo
ered these variables: ['mean', 'pylab']\n",
"`%matplotlib` prevents importing * from pylab and numpy\n",
" \"\\n`%matplotlib` prevents importing * from pylab and numpy\"\n"
]
}
],
"source": [
"%pylab inline"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA8QAAAJCCAYAAAAGMg6GAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+spNdd3/HPt7vYgUDt4JhqaztdR9muuhYIwsoEQSlySmKjFqdbV1qDhNWmcn8kUhGtiC3UqkRRJaMKI0QCRE2QFdquXXdpVvxyEUn7B6J21ji/nHDx4lC88kJs7IRC1YQ1p3/Mcbgs98fs+pq7s9/XSxrdmTPPnOe5c8KEd565z9YYIwAAANDNX9rtAwAAAIDdIIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw9gN7z61a8e+/fv3+3DAAAA4GXw6KOPPjvGuHq77VoG8f79+3Py5MndPgwAAABeBlX1v5fZzlemAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw8AVsXxtTM7PueRg/t2fE4AAGA5zhADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoKWlgriqbq6qtao6VVV3bfD85VV1/3z+4arav+65u+f4WlW9ebs5q+r6OccTc87LttpHVX1ZVd1XVZ+oqk9X1d0X+mYAAADQx7ZBXFV7krw7yS1JDiW5vaoOnbPZW5M8P8Z4XZJ7k9wzX3soydEkNyS5Ocl7qmrPNnPek+TeMcaBJM/PuTfdR5J/kOTyMcbXJvnGJP9kfZADAADARpY5Q3xjklNjjCfHGF9McizJredsc2uS++b9B5O8sapqjh8bY3xhjPGZJKfmfBvOOV9z05wjc863bLOPkeSVVbU3yZcn+WKSP1j6HQAAAKClZYL4miRPrXt8eo5tuM0Y42ySzye5aovXbjZ+VZLPzTnO3ddm+3gwyR8lOZPkd5L8+zHGc+f+ElV1Z1WdrKqTzzzzzBK/NgAAAJeyZYK4NhgbS26zU+Nb7ePGJC8k+atJrk/yL6vqtX9uwzHeO8Y4PMY4fPXVV28wFQAAAJ0sE8Snk1y37vG1SZ7ebJv51eUrkjy3xWs3G382yZVzjnP3tdk+vjvJL40x/niM8dkkv5rk8BK/FwAAAI0tE8QfSXJgXv35siwuknXinG1OJLlj3r8tyYfGGGOOH51XiL4+yYEkj2w253zNh+ccmXN+cJt9/E6Sm2rhlUnekOQ3ln8LAAAA6GjvdhuMMc5W1duTPJRkT5L3jzEer6p3Jjk5xjiR5H1JPlBVp7I4a3t0vvbxqnogyaeSnE3ytjHGC0my0Zxzl+9Icqyq3pXksTl3NttHFle
ukkn8zia9U/Pcb4+AW/IwAAALRQi5OsvRw+fHicPHlytw+DFXN87cyOz3nk4L4dnxMAALqrqkfHGNv+Ke0yX5kGAACAS44gBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALe3d7QOAzo6vndnR+Y4c3Lej8wEAwKXMGWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFpaKoir6uaqWquqU1V11wbPX15V98/nH66q/eueu3uOr1XVm7ebs6qun3M8Mee8bIl9fF1V/VpVPV5Vn6iqV1zImwEAAEAf2wZxVe1J8u4ktyQ5lOT2qjp0zmZvTfL8GON1Se5Ncs987aEkR5PckOTmJO+pqj3bzHlPknvHGAeSPD/n3mofe5P8TJJ/Osa4Icm3J/nj83wfAAAAaGaZM8Q3Jjk1xnhyjPHFJMeS3HrONrcmuW/efzDJG6uq5vixMcYXxhifSXJqzrfhnPM1N805Mud8yzb7eFOSj48xPpYkY4zfH2O8sPxbAAAAQEfLBPE1SZ5a9/j0HNtwmzHG2SSfT3LVFq/dbPyqJJ+bc5y7r8328deTjKp6qKp+vap+YKNfoqrurKqTVXXymWeeWeLXBgAA4FK2TBDXBmNjyW12anyrfexN8q1Jvmf+/HtV9cY/t+EY7x1jHB5jHL766qs3mAoAAIBOlgni00muW/f42iRPb7bN/JveK5I8t8VrNxt/NsmVc45z97XVPv7nGOPZMc
TfILSV6/xO8FAABAY8sE8UeSHJhXf74si4tknThnmxNJ7pj3b0vyoTHGmONH5xWir09yIMkjm805X/PhOUfmnB/cZh8PJfm6qvqKGcp/K8mnln8LAAAA6GjvdhuMMc5W1duzCM89Sd4/xni8qt6Z5OQY40SS9yX5QFWdyuKs7dH52ser6oEsAvVskre9eMGrjeacu3xHkmNV9a4kj825s8U+nq+qH8kiskeSXxhj/PxLelcAAAC45NXiJGsvhw8fHidPntztw2DFHF87s9uHsK0jB/ft9iEAAMCuq6pHxxiHt9tu2zPEwOrY6WgX2AAAXMqW+RtiAAAAuOQIYgAAAFoSxAAAALQkiAEAAGjJRbW4ZK3CVaEBAIDd4wwxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWtq72wcAXLyOr53Z8TmPHNy343MCAMCFcIYYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0NJSQVxVN1fVWlWdqqq7Nnj+8qq6fz7/cFXtX/fc3XN8rarevN2cVXX9nOOJOedl2+1jPv+aqvrDqvpX5/smAAAA0M+2QVxVe5K8O8ktSQ4lub2qDp2z2VuTPD/GeF2Se5PcM197KMnRJDckuTnJe6pqzzZz3pPk3jHGgSTPz7k33cc69yb5xWV/cQAAAHpb5gzxjUlOjTGeHGN8McmxJLees82tSe6b9x9M8saqqjl+bIzxhTHGZ5KcmvNtOOd8zU1zjsw537LNPlJVb0nyZJLHl
VAQAA6GyZIL4myVPrHp+eYxtuM8Y4m+TzSa7a4rWbjV+V5HNzjnP3teE+quqVSd6R5Ie2+iWq6s6qOllVJ5955pltfmUAAAAudcsEcW0wNpbcZqfGt9rHD2XxFes/3OD5P91wjPeOMQ6PMQ5fffXVW20KAABAA3uX2OZ0kuvWPb42ydO
HO6qvYmuSLJc9u8dqPxZ5NcWVV751ng9dtvto9vSnJbVf1wkiuT/ElV
8xxo8v8bsBAADQ1DJniD+S5MC8+vNlWVwk68Q525xIcse8f1uSD40xxhw/Oq8QfX2SA0ke2WzO+ZoPzzky5/zgVvsYY/zNMcb+Mcb+JD+a5N+JYQAAALaz7RniMcbZqnp7koeS7Eny/jHG41X1ziQnxxgnkrwvyQeq6lQWZ22Pztc+XlUPJPlUkrNJ3jbGeCFJNppz7vIdSY5V1buSPDbnzmb7AAAAgAtRi5OyvRw+fHicPHlytw+Dl9nxtTO7fQhs4MjBfbt9CAAAXOKq6tExxuHttlvmK9MAAABwyRHEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFrau9sHAPRyfO3Mjs535OC+HZ0PAIA+nCEGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0tHe3DwDgpTi+dmZH5ztycN+OzgcAwMXLGWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw8AXnR87cxuHwIAANCIM8QAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC3t3e0DALiYHF87s6PzHTm4b0fnAwBg5zhDDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJaWCuKqurmq1qrqVFXdtcHzl1fV/fP5h6tq/7rn7p7ja1X15u3mrK
5xxPzDkv22ofVfUdVfVoVX1i
zpQt8MAAAA+tg2iKtqT5J3J7klyaEkt1fVoXM2e2uS58cYr0tyb5J75msPJTma5IYkNyd5T1Xt2WbOe5LcO8Y4kOT5Ofem+0jybJK/O8b42iR3JPnA+b0FAAAAdLTMGeIbk5waYzw5xvhikmNJbj1nm1uT3DfvP5jkjVVVc/zYGOMLY4zPJDk159twzvmam+YcmXO+Zat9jDEeG2M8PccfT/KKqrp82TcAAACAnvYusc01SZ5a9/h0km/abJsxxtmq+nySq+b4/zrntdfM+xvNeVWSz40xzm6w/Wb7eHbdPH8/yWNjjC8s8XvxEhxfO7PbhwAAAPCSLBPEtcHYWHKbzcY3OjO91f
HkdV3ZDF16jftMF2qao7k9yZJK95zWs22gQAAIBGlvnK9Okk1617fG2Spzfbpqr2JrkiyXNbvHaz8WeTXDnnOHdfm+0jVXVtkp9N8r1jjN/a6JcYY7x3jHF4jHH46quvXuLXBgAA4FK2TBB/JMmBefXny7K4SNaJc7Y5kcUFrZLktiQfGmOMOX50XiH6+iQHkjyy2ZzzNR+ec2TO+cGt9lFVVyb5+SR3jzF+9Xx+eQAAAPraNojn3/O+PclDST6d5IExxuNV9c6q+q652fuSXFVVp5J8f5K75msfT/JAkk8l+aUkbxtjvLDZnHOudyT5/jnXVXPuTfcx53ldkn9dVR+dt6+5wPcDAACAJmpxUraXw4cPj5MnT+72Yaw0F9WC5Rw5uG+3DwEAoJ2qenSMcXi77Zb5yjQAAABccgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANDS3t0+AIBL2fG1Mzs+55GD+3Z8TgCAjpwhBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC25qBbAitnpC3W5SBcA0JUzxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLe3f7AAC4tBxfO7Pjcx45uG/H5wQAcIYYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABa2rvbBwDA7jq+dma3D2FbO32MRw7u29H5AIDV5AwxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALS0d7cPgL8Yx9fO7PYhAAAAXFScIQYAAKAlQQwAAEBLghgAAICW/A0xAFxkXo7rPhw5uG/H5wSAVecMMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAl/+wSALxEL8c/kwQAvPycIQYAAKAlQQwAAEBLghgAAICWBDEAAAAtuagWAO24CBYAkDhDDAAAQFPOEF+knL0AYCft9H+vHDm4b0fnA4Dd4AwxAAAALQliAAAAWhLEAAAAtORviAGA8+ZvkgG4FDhDDAAAQEuCGAAAgJZ8ZRoA2HUvxz836GvYAGzHGWIAAABacoYYALgkufAXANtxhhgAAICWnCEGANgFHc9gd/ydd5q/t+dC+M/N5pY6Q1xVN1fVWlWdqqq7Nnj+8qq6fz7/cFXtX/fc3XN8rarevN2cVXX9nOOJOedlF7oPAAAA2My2QVxVe5K8O8ktSQ4lub2qDp2z2VuTPD/GeF2Se5PcM197KMnRJDckuTnJe6pqzzZz3pPk3jHGgSTPz7nPex/n+0YAAADQyzJfmb4xyakxxpNJUlXHktya5FPrtrk1yb+d9x9M8uNVVXP82BjjC0k+U1Wn5nzZaM6q+nSSm5J899zmvjnvT1zAPn5tyfcAAGBbL8dXDi92fueX7lL5Wilcqpb5yvQ1SZ5a9/j0HNtwmzHG2SSfT3LVFq/dbPyqJJ+bc5y7
PdBwAAAGxqmTPEtcHYWHKbzcY3CvGttr+QffzZA6y6M8md8+EfVtVaklcneXaD17MarN/qs4arzxquNuu3+qzh6rOGq88aXpz+2jIbLRPEp5Nct+7xtUme3mSb01W1N8kVSZ7b5rUbjT+b5Mqq2jvPAq/f/kL28SVjjPcmee/6sao6OcY4vOlvzkXN+q0+a7j6rOFqs36rzxquPmu4+qzhalvmK9MfSXJgXv35siwuYHXinG1OJLlj3r8tyYfGGGOOH51XiL4+yYEkj2w253zNh+ccmXN+8AL3AQAAAJva9gzxGONsVb09yUNJ9iR5/xjj8ap6Z5KTY4wTSd6X5APzglbPZRG4mds9kMUFuM4medsY44Uk2WjOuct3JDlWVe9K8ticOxeyDwAAANhMLU6y9lRVd86vUrOCrN/qs4arzxquNuu3+qzh6rOGq88a
WQQwAAEBfy/wNMQAAAFxyWgZxVd1cVWtVdaqq7trt4+moqt5fVZ+tqk+uG/vqqvrlqnpi/nzVHK+q+rG5Xh+vqteve80dc/snquqOdePfWFWfmK/5saqqrfbB+amq66rqw1X16ap6vKr+xRy3hiuiql5RVY9U1cfmGv7QHL++qh6e7+/988KHmRcuvH+ux8NVtX/dXHfP8bWqevO68Q0/azfbB+evqvZU1WNV9XPzsfVbIVX12/Nz7qNVdXKO+RxdIVV1ZVU9WFW/Mf878Zut4eqoqoPz
5evP1BVX2fNWxmjNHqlsVFvH4ryWuTXJbkY0kO7fZxdbsl+bYkr0/yyXVjP5zkrnn
iT3zPvfmeQXs/g3p9+Q5OE5/tVJnpw/XzXvv2o+90iSb56v+cUkt2y1D7fzXr99SV4/739Vkt9Mcsgars5tvq9fOe9/WZKH59o8kOToHP/JJP9s3v/nSX5y3j+a5P55/9D8HL08yfXz83XPVp+1m+3D7YLW8fuT/KckP7fVe2v9Ls5bkt9O8upzxnyOrtAtyX1J/vG8f1mSK63hat7m597vZvFv11rDRrddP4C/8F948R/Ih9Y9vjvJ3bt9XB1vSfbnzwbxWpJ98/6+JGvz/k8luf3c7ZLcnuSn1o3/1Bzbl+Q31o1/abvN9uH2ktfyg0m+wxqu5i3JVyT59STflMW/B793jn/p8zKLfxXgm+f9vXO7Ovcz9MXtNvusna/ZcB9u571u1yb5lSQ3Jfm5rd5b63dx3rJxEPscXZFbkr+c5DOZ1+Sxhqt9S/KmJL9qDfvdOn5l+pokT617fHqOsfv+yhjjTJLMn18zxzdbs63GT28wvtU+uEDzq5ffkMUZRmu4QubXbT+a5LNJfjmLM4KfG2OcnZusf9+/tFbz+c8nuSrnv7ZX
EPzs+PJvmBJH8yH2/13lq/i9NI8t+r6tGqunOO+RxdHa9N8kySn67Fny78h6p6Zazhqjqa5D/P+9awkY5BXBuMudT2xW2zNTvfcXZYVX1lkv+a5PvGGH+w1aYbjFnDXTbGeGGM8fVZnGm8Mcnf2Giz+XOn1tDa7oCq+jtJPjvGeHT98AabWr+L27eMMV6f5JYkb6uqb9tiW2t18dmbxZ9
cQY4xuS/FEWX33djDW8SM1rIXxXkv+y3aYbjFnDFdcxiE8nuW7d42uTPL1Lx8Kf9XtVtS9J5s/PzvHN1myr8Ws3GN9qH5ynqvqyLGL4P44xjs9ha7iCxhifS/I/svh7qCurau98av37/qW1ms9fkeS5nP/aPrvFPljetyT5rqr67STHsvja9I/G+q2UMcbT8+dnk/xsFv/DlM/R1XE6yekxxsPz8YNZBLI1XD23JPn1McbvzcfWsJGOQfyRJAdqcZXMy7L4esSJXT4mFk4kuWPevyOLv0t9cfx755X93pDk8/OrJQ8leVNVvWpeme9NWfwt25kk/6eq3jCv5Pe958y10T44D/N9fV+ST48xfmTdU9ZwRVTV1VV15bz/5Un+dpJPJ/lwktvmZueu4Yvv+21JPjTGGHP8aC2uYnx9kgNZXEBkw8/a+ZrN9sGSxhh3jzGuHWPsz+K9/dAY43ti/VZGVb2yqr7qxftZfP59Mj5HV8YY43eTPFVVB+fQG5N8KtZwFd2eP/26dGINe9ntP2LejVsWV4j7zSz+Xu4Hd/t4Ot6y+NA5k+SPs/hfz96axd+m/UqSJ+bPr57bVpJ3z/X6RJLD6+b5R0lOzds/XDd+OIv/x+K3kvx45gUvNtuH23mv37dm8ZWfjyf56Lx9pzVcnVuSr0vy2FzDTyb5N3P8tVkE0aksvjp2+Rx/xXx8aj7/2nVz/eBcp7XMq2fO8Q0/azfbh9sFr+W350+vMm39VuQ238ePzdvjL77HPkdX65bk65OcnJ+l/y2LKwxbwxW6ZXFhyd9PcsW6MWvY6PbiggAAAEArHb8yDQAAAIIYAAAcjf74AAAALUlEQVSAngQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoKX/D7gn8g719yqtAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"hist(targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA6UAAAJCCAYAAAA4F2HIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAGthJREFUeJzt3X+s3fdd3/HXm5iABAUKMcjkB4m2YDWqOpVZEVOlUQTtkmpKNg+mZGXrtkI0iYxJsKmZOoUp1aTS/sE0LaxErGpB0CxDZljDXVptRd0GYXFFyZqES63QLVasxbRdUVWxLOO9P3xand7e63tqH/ttn/t4SJbv+Z7P/d531Y+O7zPf86O6OwAAADDha6YHAAAAYP8SpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIw5MPWDr7vuur755punfjwAAACX0Mc+9rE/6u6De60bi9K
745J0+enPrxAAAAXEJV9T9WWefpuwAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIw5MD0AAPvTsa0z0yOc19HDh6ZHAIB9wZVSAAAAxuwZpVX13qp6sao+scv9b66qpxZ/fquq/tz6xwQAAGATrXKl9H1J7jjP/X+Y5Pu6+zVJ3pHkkTXMBQAAwD6w52tKu/ujVXXzee7
aWbTyS54eLHAgAAYD9Y92tK35rkg7vdWVX3VdXJqjp59uzZNf9oAAAArjZri9Kq+v6ci9K37bamux/p7iPdfeTgwYPr+tEAAABcpdbykTBV9Zokv5Dkzu7+9DrOCQAAwOa76CulVXVTkmNJ/mZ3/8HFjwQAAMB+seeV0qr6QJLXJ7muqk4n+ekkX5sk3f2eJA8m+bYkP1dVSfJydx+5VAMDAACwOVZ5991797j/R5P86NomAgAAYN9Y97vvAgAAwMpEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGMOTA8AAFyYY1tn1nq+o4cPrfV8ALAKV0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYs2eUVtV7q+rFqvrELvdXVf2LqjpVVU9V1fesf0wAAAA20SpXSt+X5I7z3H9nklsXf+5L8q8ufiwAAAD2gz2jtLs/muQz51lyd5Jf7HOeSPItVXVoXQMCAACwudbxmtLrkzy/dPv04hgAAACc1zqitHY41jsurLqvqk5W1cmzZ8+u4UcDAABwNVtHlJ5OcuPS7RuSvLDTwu5+pLuPdPeRgwcPruFHAwAAcDVbR5QeT/K3Fu/C+71JPtfdZ9ZwXgAAADbcgb0WVNUHkrw+yXVVdTrJTyf52iTp7vckOZHkTUlOJflCkr9zqYYFAABgs+wZpd197x73d5IfX9tEALChjm15IhEAbLeOp+8CAADABRGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjDkwPQAAXImObZ2ZHgEA9gVXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABhzYHoAAK58x7bOTI8AAGwoV0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYs1KUVtUdVbVVVaeq6oEd7r+pqj5SVb9bVU9V1ZvWPyoAAACbZs8oraprkjyc5M4ktyW5t6pu27bsnyR5rLtfm+SeJD+37kEBAADYPKtcKb09yanufq67X0ryaJK7t63pJN+0+Pqbk7ywvhEBAADYVKtE6fVJnl+6fXpxbNk/TfIjVXU6yYkkf3+nE1XVfVV1sqpOnj179gLGBQAAYJOsEqW1w7HedvveJO
7huSvCnJL1XVV5y7ux/p7iPdfeTgwYNf
QAAABslFWi9HSSG5du35CvfHruW5M8liTd/dtJvj7JdesYEAAAgM21SpQ+meTWqrqlqq7NuTcyOr5tzf9M8gNJUlWvyrko9fxcAAAAzmvPKO3ul5Pcn+TxJM/m3LvsPl1VD1XVXYtlP5Xkx6rq95J8IMnf7u7tT/EFAACAL3NglUXdfSLn3sBo+diDS18/k+R16x0NAACATbfK03cBAADgkljpSikAsPmObZ1Z6/mOHj601vMBsJlcKQUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGDMgekBAIDNdGzrzPQIezp6+ND0CAD7niulAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjFkpSqvqjqraqqpTVfXALmv+elU9U1VPV9WvrHdMAAAANtGBvRZU1TVJHk7yhiSnkzxZVce7+5mlNbcm+cdJXtfdn62qb79UAwMAALA5VrlSenuSU939XHe/lOTRJHdvW/NjSR7u7s8mSXe/uN4xAQAA2ESrROn1SZ5fun16cWzZdyf57qr6r1X1RFXdsa4BAQAA2Fx7Pn03Se1wrHc4z61JXp/khiT/uape3d3/+8tOVHVfkvuS5Ka
vqqhwUAAGCzrHKl9HSSG5du35DkhR3W/Hp3/9/u/sMkWzkXqV+mux/p7iPdfeTgwYMXOjMAAAAbYpUofTLJrVV1S1Vdm+SeJMe3rfl3Sb4/Sarqupx7Ou9z6xwUAACAzbNnlHb3y0nuT/J4kmeTPNbdT1fVQ1V112LZ40k+XVXPJPlIkn/U3Z++VEMDAACwGVZ5TWm6+0SSE9uOPbj0dSf5ycUfAAAAWMkqT98FAACAS0KUAgAAMEaUAgAAMGal15QCcHU5tnVmegQAgJW4UgoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMCYA9MDAJAc2zozPQIAwAhXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABjjI2EAgH1r3R/HdPTwobWeD2A/cKUUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMQemBwAA2BTHts6s/ZxHDx9a+zkBriSulAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBmpSitqjuqaquqTlXVA+dZ90NV1VV1ZH0jAgAAsKn2jNKquibJw0nuTHJbknur6rYd1r0iyU8k+Z11DwkAAMBmWuVK6e1JTnX3c939UpJHk9y9w7p3JHlXkj9Z43wAAABssFWi9Pokzy/dPr049iVV9dokN3b3v1/jbAAAAGy4VaK0djjWX7qz6muS/GySn9rzRFX3VdXJqjp59uzZ1acEAABgI60SpaeT3Lh0+4YkLyzdfkWSVyf5zar6VJLvTXJ8pzc76u5HuvtIdx85ePDghU8NAADARlglSp9McmtV3VJV1ya5J8nxL97Z3Z
7uu6++buvjnJE0nu6u6Tl2RiAAAANsaeUdrdLye5P8njSZ5N8lh3P11VD1XVXZd6QAAAADbXgVUWdfeJJCe2HXtwl7Wvv/ixAAAA2A9WefouAAAAXBKiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDEHpgcAAODyObZ1Zq3nO3r40FrPB+w
pQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAw5sD0AAAA7O7Y1pnpEQAuKVdKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGHNgegCAq9GxrTPTIwAAbARXSgEAABgjSgEAABgjSgEAABjjNaXAFcfrNQEA9g9XSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABjjc0oBALhgl+KzpY8ePrT2cwJXLldKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGLNSlFbVHVW1VVWnquqBHe7/yap6pqqeqq
WFXftf5RAQAA2DR7RmlVXZPk4SR3Jrktyb1Vddu2Zb+b5Eh3vybJryZ517oHBQAAYPOscqX09iSnuvu57n4pyaNJ7l5e0N0f6e4vLG4+keSG9Y4JAADAJlolSq9P8vzS7dOLY7t5a5IP7nRHVd1XVSer6uTZs2dXnxIAAICNtEqU1g7HeseFVT+S5EiSd+90f3c/0t1HuvvIwYMHV58SAACAjXRghTWnk9y4dPuGJC9sX1RVP5jk7Um+r7v/z3rGAwAAYJOtcqX0ySS3VtUtVXVtknuSHF9eUFWvTfLzSe7q7hfXPyYAAACbaM8o7e6Xk9yf5PEkzyZ5rLufrqqHququxbJ3J/nGJP+2qj5eVcd3OR0AAAB8ySpP3013n0hyYtuxB5e+/sE1zwUAAMA+sMrTdwEAAOCSEKUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMOTA9AHD1O7Z1ZnoEAACuUq6UAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMMbnlAIAsNHW/XnaRw8fWuv5YL9zpRQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxB6YHAACAZce2zkyPAFxGrpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAw5sD0AAAAcDU5tnVmrec7evjQWs8HVxtXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABhzYHoA2DTHts6s9XxHDx9a6/kAAOBK4kopAAAAY0QpAAAAY0QpAAAAY7ymFK5wXqMKAJtt3f/WJ/695+riSikAAABjRCkAAABjRCkAAABjvKaUq4rXXAAAwGZxpRQAAIAxohQAAIAxnr57GfloD64El+Ip0AAAcKFcKQUAAGCMKAUAAGCMKAUAAGCM15Seh9fe7Q/+fwYANo3fby7eut+/xfvL7M6VUgAAAMaIUgAAAMaIUgAAAMasFKVVdUdVbVXVqap6YIf7v66q/s3i/t+pqpvXPSgAAACbZ88oraprkjyc5M4ktyW5t6pu27bsrUk+291/NsnPJvmZdQ8KAADA5lnlSuntSU5193Pd/VKSR5PcvW3N3Unev/j6V5P8QFXV+sYEAABgE60SpdcneX7p9unFsR3XdPfLST6X5NvWMSAAAACba5XPKd3pimdfwJpU1X1J7lvc/HxVba3w868G1yX5o+khuGLYDyyzH1hmP7DMfmCZ/cCyTdkP37XKolWi9HSSG5du35DkhV3WnK6qA0m+Oclntp+oux9J8sgqg11Nqupkdx+ZnoMrg/3AMvuBZfYDy+wHltkPLNtv+2GVp+8+meTWqrqlqq5Nck+S49vWHE/ylsXXP5TkP3X3V1wpBQAAgGV7Xint7per6v4kjye5Jsl7u/vpqnooycnuPp7kXyf5pao6lXNXSO+5lEMDAACwGVZ5+m66+0SSE9uOPbj09Z8k+eH1jnZV2binJHNR7AeW2Q8ssx9YZj+wzH5g2b7aD+VZtgAAAExZ5TWlAAAAcEmI0vOoqvdW1YtV9YmlY99aVR+uqk8u/n7lLt/7/6rq44s/298YiqvQLvvhh6vq6ar606ra9R3SquqOqtqqqlNV9cDlmZhL6SL3w6eq6r8vHh9OXp6JuZR22Q/vrqrfr6qnqurXqupbdvlejw8b5iL3g8eHDbPLfnjHYi98vKo+VFXfucv3vmXxO+cnq+otO63h6nKR+2Fj+8LTd8+jqv5iks8n+cXufvXi2LuSfKa737n45eGV3f22Hb738939jZd3Yi6lXfbDq5L8aZKfT/IPu/srfoGoqmuS/EGSN+Tcxyc9meTe7n7mcs3O+l3oflis+1SSI929CZ8/RnbdD2/MuXejf7mqfiZJtv974fFhM13oflis+1Q8PmyUXfbDN3X3Hy++/okkt3X339v2fd+a5GSSI0k6yceS/Pnu/uzlnJ/1utD9sLhvY/vCldLz6O6P5is
XuJO9ffP3+JH/lsg7FmJ32Q3c/291be3zr7UlOdfdz3f1Skkdzbh9xFbuI/cAG2mU/fKi7X17cfCLnPud7O48PG+gi9gMbaJf98MdLN78h56Jzu7+U5MPd/ZlFiH44yR2XbFAui4vYDxtNlH71vqO7zyTJ4u9v32Xd11fVyap6oqqE6/52fZLnl26fXhxj/+okH6qqj1XVfdPDcFn83SQf3OG4x4f9abf9kHh82Deq6p9V1fNJ3pzkwR2WeHzYR1bYD8kG94UovXRu6u4jSf5Gkn9eVX9meiDG1A7H9t1/AePLvK67vyfJnUl+fPFUHjZUVb09yctJfnmnu3c45vFhg+2xHxKPD/tGd7+9u2/Mub1w/w5LPD7sIyvsh2SD+0KUfvX+V1UdSpLF3y/utKi7X1j8/VyS30zy2ss1IFec00luXLp9Q5IXhmbhCrD0+PBikl/LuadwsoEWb0zyl5O8uXd+EwePD/vICvvB48P+9CtJ/toOxz0+7E+77YeN7gtR+tU7nuSL7372liS/vn1BVb2yqr5u8fV1SV6XxJtW7F9PJrm1qm6pqmuT3JNz+4h9qKq+oape8cWvk7wxySfO/11cjarqjiRvS3JXd39hl2UeH/aJVfaDx4f9o6puXbp5V5Lf32HZ40neuPi98pU5tx8evxzzcXmtsh82vS9E6XlU1QeS/HaSw1V1uqremuSdSd5QVZ/MuXdLfOdi7ZGq+oXFt74qycmq+r0kH0nyTu+kePXbaT9U1V+tqtNJ/kKS36iqxxdrv7OqTiTJ4o0t7s+5f0ieTfJYdz8987+CdbnQ/ZDkO5L8l8Xjw39L8hvd/R8m/jewPrv8e/Evk7wiyYcXb9
nsVajw8b7kL3Qzw+bKTdfp+sqk9U1VM5F5v/YLH2S79Pdvdnkrwj5/7j1ZNJHloc4yp2ofshG94XPhIGAACAMa6UAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMO
A9XyCq7o05fSAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"log_targets = records.map(lambda r: np.log(float(r[-1]))).collect()\n",
"\n",
"hist(log_targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"data_log = data.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"model_log = LinearRegressionWithSGD.train(data_log, iterations=10, step=0.1)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_log = data_log.map(lambda p: (np.exp(p.label), np.exp(model_log.predict(p.features))))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared E
or: 39039267707.7658\n",
"log - Mean Absolue E
or: 180921.1959\n",
"Root Mean Squared Log E
or: 12.0307\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_log.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Non log-transformed predictions:\n",
"[(208500.0, -1.3111060925180484e+75), (181500.0, -1.4720767452081686e+75), (223500.0, -1.7050281430818638e+75)]\n",
"Log-transformed predictions:\n",
"[(208500.00000000012, 0.0), (181499.99999999988, 0.0), (223500.0, 0.0)]\n"
]
}
],
"source": [
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted.take(3)))\n",
"\n",
"print (\"Log-transformed predictions:\\n\" + str(true_vs_predicted_log.take(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tuning model parameters"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"train, test = data.randomSplit([0.7, 0.3], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"train_size=train.count()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"test_size=test.count()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training data size: 1050\n"
]
}
],
"source": [
"print (\"Training data size: %d\" % train_size)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test data size: 410\n"
]
}
],
"source": [
"print (\"Test data size: %d\" % test_size)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train + Test size : 1460\n"
]
}
],
"source": [
"print (\"Train + Test size : %d\" % (train_size + test_size))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The impact of parameter settings for linear models"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"def evaluate(train, test, iterations, step, regParam, regType, intercept):\n",
"\n",
" model = LinearRegressionWithSGD.train(train, iterations, step, regParam=regParam, regType=regType, intercept=intercept)\n",
"\n",
" tp = test.map(lambda p: (p.label, model.predict(p.features)))\n",
" \n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterations"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 5, 11, 15, 20, 50]\n",
"[16.401492085322918, 81.34883033703413, 176.05369822746945, 238.23038626017032, nan, nan]\n"
]
}
],
"source": [
"params = [1, 5, 11, 15, 20, 50]\n",
"\n",
"metrics = [evaluate(train, test, param, 0.1, 0.0, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEOCAYAAACHE9xHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHwNJREFUeJzt3Xl8VPW9
HXB8IOsiUQVllkR4oYRdQqVAVqbWlr1VrcKWhre/vro+2t+nBpba/X1tvb5bZVcUOqotS6tbVsKlVbAQMiJCwSEQQhC3sMELJ8f3+cExziZJ+ZM3Pm/Xw88sjMmbN8MnPmnZMzn5yvOecQEZHwahV0ASIiEl8KehGRkFPQi4iEnIJeRCTkFPQiIiGnoBcRCTkFvYhIyCnoRURCTkEvIhJyCnoRkZDLCLoAgMzMTDdo0KCgyxARSSmrV6/e45zLami+pAj6QYMGkZubG3QZIiIpxcy2N2Y+nboREQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiIScgl5EJCCL8grZV3Ys7ttR0IuIBOC9olK+89Qa/nfp5rhvS0EvIpJg1dWOW59bT5f2GXz/wuFx356CXkQkwRa8/SGrt+/ntotH0bNzu7hvT0EvIpJAxYeOcu8/NjFpSE++dn
hGxTQS8ikkA
dsGyiur+a+vjMXMErJNBb2ISIK8tqmYv6
zXemnMKQrM4J266CXkQkAQ4fq+T2F/IY1qszN50/NKHbTorLFIuIhN2vl77HRweO8OebJtE2I7HH2DqiFxGJs7yPDvLov7Zx5ZkDOWNQj4RvX0EvIhJHVdWO255fT/eO
ll+shAalDQi4jE0eP/3sa6nQe564uj6dqxTSA1KOhFROJk14Ej/GrJZiaPyOKScX0Cq0NBLyISB8457nwxj2oHP5uRuJ75aBT0IiJxsCivkGUbi/n+RcMY0KNjoLUo6EVEYuzQ0Qrueimf0X1O4oZzBgddjvroRURi7b5Fm9nzcTkPXZNDRuvgj6eDr0BEJERWb9/PEyu3c82kQXxmQLegywEU9CIiMVNRVc1tz60n+6T2/HDaiKDLOU6nbkREYuShN7ayuaiUuVefTud2yROvOqIXEYmB7XvL+O2yLUwb05upY7KDLucECnoRkRZyznH7C3m0ad2Kn35pbNDlfIqCXkSkhV5cu4s3tuzhR9NGkN21fdDlfIqCXkSkBQ4cPsbP
aB8QO6cdVZJwddTlTJ82mBiEgKuufljRw4UsETXz2V1q2Cu8xBfXRELyLSTCu27mVh7k5mf3YIo/qcFHQ5dVLQi4g0w9GKKm57fj0DenTgexcMC7qceunUjYhIM/xx+ftsLSlj/g1n0qFt66DLqZeO6EVEmqiguJT7lxcwY3xfzhueFXQ5DVLQi4g0QXW147bn8ujYNoM7LhkddDmNoqAXEWmChbk7WLVtH7ddPJLMzu2CLqdRFPQiIo1UUlrOPS9v5MzBPbg8Z0DQ5TRag0FvZgPM7DUz22hm+Wb2PX96DzNbamZ
O/d/elmZr8zswIzW2dmE+L9Q4iIJMLP
aBoxXV3POVUwMdGrCpGnNEXwn8wDk3CjgLuNnMRgO3AK8454YB
j3AT4PDPO/5gD3x7xqEZEEW765mJfe3cW3Jg/llF6dgy6nSRoMeufcbufcGv92KbAR6AfMAB73Z3sc+LJ/ewYw33lWAN3MLLjhz0VEWujIsSrueDGPIVmd+PaUoUGX02RNOkdvZoOA04CVQG/n3G7wfhkAvfzZ+gE7Ih
6U8TEUlJv3nlPXbsO8I9XzmVdhnJ3TMfTaOD3sw6A38B/p9z7lB9s0aZ5qKsb46Z5ZpZbklJSWPLEBFJqA27DvHwGx9weU5/zhrSM+hymqVRQW9mbfBC/knn3HP+5KKaUzL+92J/+k4g8uPo/sCu2ut0zs11zuU453KyspL/Hw5EJP1UVTtufX493Tq04baLRwVdTrM1puvGgEeAjc65/4146CXgWv/2tcCLEdOv8btvzgIO1pziERFJJU+s2M67Ow5wxyWj6daxbdDlNFtjrnVzDnA1sN7M1vrTbgPuBRaa2SzgQ+Ay/7GXgYuBAuAwcH1MKxYRSYDCg0e5
FmPjsskxnj+wZdTos0GPTOuTeJft4d4IIo8zvg5hbWJSISqLteyqOiqpqff3lsSvXMR6P/jBURqWVxfiGL84v43oXDOLlnp6DLaTEFvYhIhNKjFdz1Yj4js7sw+7NDgi4nJnQ9ehGRCL9a8h5FpUf541UTaNM6HMfC4fgpRERiYO2OAzz+1jauPutkJgzsHnQ5MaOgFxEBKqqqufW59fTq0o4fTRsRdDkxpVM3IiLAo29+wMbdh3jgqgl0ad8m6HJiSkf0IpL2duw7zK+XvceFo3ozbUx20OXEnIJeRNKac47bX8ijtRl3zxiT8j3z0SjoRSSt/XXdbv75Xgk/mDqCvt06BF1OXCjoRSRtHTxcwd1/zWdc/65ce/agoMuJG30YKyJp695FG9l/uIJ5159J61bhO2VTQ0f0IpKWVn2wjwWrdnDDOYMY269r0OXElYJeRNJOeWUVtz2/nn7dOvD9i4YHXU7c6dSNiKSdB5ZvpaD4Yx677gw6tg1/DOqIXkTSyvslH/OH1wr4wrg+TBnZq+EFQkBBLyJpo/RoBT9Y+C7t2rTiri+ODrqchAn/3ywiIsCBw8e49rG3yfvoIL+/8jR6dWkfdEkJo6AXkdArKS3n6kdWsrWkjPtnTmBqCC9zUB8FvYiE2u6DR5j50Ep2HTzCI9fl8NlhWUGXlHAKehEJre17y5j58EoOHK5g/g0TOXNwj6BLCoSCXkRCqaC4lJkPr6S8spqnZk9kXP9uQZcUGAW9iIRO/q6DXP3IKlqZ8fScsxiZfVLQJQVK7ZUiEiprPtzPlXNX0D6jFQtvVMiDjuhFJET+/f4evvl4Llld2vHkNyfSv3vHoEtKCgp6EQmF1zYVc9MTqxnYoyNPfnMivU5Knz75hijoRSTlvbx+N997+h1GZHdh/g0T6dGpbdAlJRUFvYiktGdX7+Q/n32X0wZ257Hrz+CkkA3sHQsKehFJWX96axt3vJjP2UN78tA1OXRqp0iLRs+KiKSkB
5Pv/9j01cMLIXf5g5gfZtWgddUtJS0ItISnHO8etlW/jdK1v4wrg+/OaK8bRprU7x+ijoRSRlOOf4+d838sibH3DZ6f2599JxoR7rNVYU9CKSEqqqHbe/kMeCVR9y3dmDuPOS0bRSyDeKgl5Ekl5lVTU
PO7vLB2F9+aPJT/nDYCM4V8YynoRSSplVdW8R8L3mFxfhE/mjaCm6ecEnRJKUdBLyJJ68ixKm58YjWvv1fCnZeM5oZzBwddUkpq8KNqM3vUzIrNLC9i2k/M7CMzW+t/XRzx2K1mVmBmm81sWrwKF5FwKz1awbWPreKNLSX84tJTFfIt0Jgj+nnA74H5ta
2jn3P5ETzGw08HVgDNAXWGZmw51zVTGoVUTSxIHDx7j20VXk7TrEb64Yz4zx/YIuKaU1eETvnHsd2NfI9c0AnnbOlTvnPgAKgDNbUJ+IpJmS0nK+PncFG3eXcv/MCQr5GGjJfxl8x8zW+ad2uvvT+gE7IubZ6U8TEWnQ7oNHuOLBt9i2t4xHrstJu0G846W5QX8/MBQYD+wGfuVPj9bv5KKtwMzmmFmumeWWlJQ0swwRCYvte8u47IG3KC4tZ/4NE9NyEO94aVbQO+eKnHNVzrlq4CE+OT2zExgQMWt/YFcd65jrnMtxzuVkZekFFUlnBcWlXP7gW3xcXslTs9N3EO94aVbQm1mfiLtfAWo6cl4Cvm5m7cxsMDAMWNWyEkUkzPI+OsjlD66gqhqemTMprQfxjpcGu27MbAEwGcg0s53AXcBkMxuPd1pmG3AjgHMu38wWAhuASuBmddyISF1Wb9/PdY+toku7DJ6cfRaDMzsFXVIomXNRT6EnVE5OjsvNzQ26DBFJII3v2nJmtto5l9PQfPrPWBFJOI3vmlgKehFJKI3vmngKehFJGI3vGgwFvYgkRM34ruec4o3v2rGt4idR9EyLSNzVjO964ahe/P4bGt810RT0IhI3keO7XjKuD7/W+K6BUNCLSFxofNfkoaAXkZjT+K7JRUEvIjEVOb7rtycP5Uca3zVwCnoRiRmN75qcFPQiEhMa3zV5KehFpMVKj1Yw6/Fc3t62j19ceipXnDEw6JIkgoJeRFpE47smPwW9iDRbSWk5Vz+ykq0lZdw/c4KG/ktSCnoRaZZdB45w1cMr2X3wKI9cl6Oh/5KYgl5Emmz73jK+8dBKDh2pYP6sMzljkIb+S2YKehFpkoLiUmY+vJLyymqenD1RQ/+lAAW9iDRa3kcHuebRVbQy45k5kxiR3SXokqQRdHUhEWmU1dv3c+VDK2if0Yo/36SQTyU6oheRBv27YA/fnJ9Lry7teELju6YcBb2I1OvVTUXc9MQaBvXsyBOzNL5rKlLQi0idNL5rOCjoRSQqje8aHgp6EfkUje8aLnr1ROQED/zzfe7V+K6hoqAXEcAf33Xpe/zu1QKN7xoyCnoROWF818tz+vPfX9X4rmGioBdJcxrfNfwU9CJpTOO7pgcFvUiaKq+s4rtPvcOSDRrfNewU9CJp6MixKub8KZc3tuzhri+O5vpzNL5rmCnoRdJM6dEKZs3L5e3t+/jlpeO4/IwBQZckcaagF0kjNeO75u86xG+/fhpf+kzfoEuSBFDQi6SJE8Z3vep0LhrdO+iSJEEU9CJpQOO7pjcFvUjIaXxXafD/m83sUTMrNrO8iGk9zGypmW3xv3f3p5uZ/c7MCsxsnZlNiGfxIlK/LUWlXPbAW5Qdq+Sp2Wcp5NNUYy5kMQ+YXmvaLcArzrlhwCv+fYDPA8P8rznA
EpU0SaKu+jg1wxdwUOeGbOJE7t3zXokiQgDQa9c+51YF+tyTOAx/3bjwNfjpg+33lWAN3MrE+sihWRxokc33XhjRrfNd0199J0vZ1zuwH877386f2AHRHz7fSniUiC/LtgD1c/spKendqy8KZJDM7sFHRJErBYX4M02kUyXNQZzeaYWa6Z5ZaUlMS4DJH09OqmIq6b9zb9u3dg4Y2TNIi3AM0P+qKaUzL+92J/+k4g8t/s+gO7oq3AOTfXOZfjnMvJylKrl0hL/X3dbubMX83w3p15es4kDeItxzU36F8CrvVvXwu8GDH9G
75izgYM0pHhGJn2dX7+S7C9YwfkA3npp9lgbxlhM02EdvZguAyUCmme0E7gLuBRaa2SzgQ+Ayf/aXgYuBAuAwcH0cahaRCDXju557SiZzrzld47vKpzS4RzjnrqzjoQuizOuAm1talIg0jsZ3lcbQr36RFKTxXaUpFPQiKUbju0pTKehFUog3vut6FqzaofFdpdEU9CIpInJ815unDOWHUzW+qzSOgl4kBWh8V2kJBb1IktP4rtJSCnqRJKbxXSUWFPQiSUrju0qsKOhFkpDGd5VYUtCLJJnI8V0fve4Mzh2WGXRJkuIU9CJJoryyije37OHOF/M1vqvElIJeJEBl5ZUs31zCovxCXttUzMfllWR2bsuTsycyrn+3oMuTkFDQiyTYgcPHWLaxmEV5hby+pYRjldX06NSWS8b1YdrYbM4e2pN2Gbo4mcSOgl4kAYoPHWXxhiIW5xXy1ta9VFU7+nRtzzfOHMj0sdnknNydDF2UTOJEQS8SJ9v3lrE4v5BFeYWs+fAAAEMyOzHnvCFMH5PNuP5ddQkDSQgFvUiMOOd4r+hjFuUVsii/kI27DwEwpu9J/OCi4Uwfm80pvTor3CXhFPQiLVBd7Xh35wEW5ReyJL+ID/aUYQanD+zO7V8YxbQx2QzooQG6JVgKepEmqqyqZtW2fSzOK2RxfhGFh46S0cqYNLQns84dzNTRvTUwtyQVBb1II5RXVvGvgj0syitk6YYi9h+uoF1GK84fnsV/jh3BBSN707Vjm6DLFIlKQS9Sh2g97l3aZfC5Ub2YPiab80dkaSBuSQnaS0Ui7C87xrKNRSzOL1KPu4SGgl7SXtGhoyzJ9zplVmzdR1W1o29Ej/sZg3poTFZJaQp6SUt19bjfeN4Qpo/N5tR+6nGX8FDQS1pwzrG5qNTrcc8rZFNhKaAed0kPCnoJrcge98V5hWzbexgzyDlZPe6SXhT0Eir19bjPPm8IF43uTa8u6nGX9KKgl5R3tKKKf7+vHneRuijoJSV9XF7J8s3epX6Xby5Rj7tIPfROkJTxSY97Ia9v2cOxymp6dmrLFz/Th6lj1OMuUhcFvSS1unrcZ04cyPQx2eSox12kQQp6ST
95axKK+QxfkRPe5Z6nEXaS4FvQSurh73sf1O4odTa3rcuwRcpUjqUtBLINTjLpI4CnpJGPW4iwRDQS9xdbTik+u4L9uoHneRILQo6M1sG1AKVAGVzrkcM+sBPAMMArYBlzvn9resTEklkT3ur20qpuxYFV3aZXDBqF5MH5vNecPV4y6SSLF4t01xzu2JuH8L8Ipz7l4zu8W
+MYbEeSWF097l8a35dpY7I5e2gmbTNaBV2mSFqKx2HVDGCyf/txYDkK+lCK1uPer1sH9biLJJmWBr0DlpiZAx50zs0FejvndgM453abWa+WFinJY9se/zru+YW8E9HjftP5Q5g+pg9j+52kHneRJNPSoD/HObfLD/OlZrapsQua2RxgDsDAgQNbWIbEi3rcRVJfi4LeObfL/15sZs8DZwJFZtbHP5rvAxTXsexcYC5ATk6Oa0kdElvV1Y61Ow+wWD3uIqHQ7KA3s05AK+dcqX97KnA38BJwLXCv
3FWBQq8VVZVc2qD/Z5/8CUX0jRoXL1uIuEREuO6HsDz/vnYzOAp5xzi8zsbWChmc0CPgQua3mZEg/Retzbt/F63KePzeZzI9TjLhIGzQ5659xW4DNRpu8FLmhJURI/6nEXST96R6eB/WXHWLqxiCURPe6ZndvypfH9mDamt3rcRUJOQR9ShQePsmSD1ymz8oNPetyvmngy08dmc
J3dXjLpImFPQhoh53EYlGQZ/CnHNsKiw9PkiHetxFJBoFfYo53uOe5x25
d73M84uYd63EUkKgV9Cqirx/3sUzK58byhXDS6N1ld2gVdpogkKQV9korscV+6sYgDtXvcR/amawf1uItIwxT0SeTj8kpe21TMovxCltf0uLfP4MJRvZk2Jpvzh2fRoW3roMsUkRSjoA9YTY/74rxC3ig4scd9+thsJg3pqR53EWkRBX0A1OMuIomkoE+QbXvKWJTvhfvaHV6P+1D1uItIAijo46SuHvdT+3XlR9NGMG1M
W4i0hCKOhjqL4e9zsuGc20Mb3p31097iKSWAr6ForW496mtTFpqHrcRSQ5KOib4WhFFW9u2cOifO867jU97pOHe5f6nTKyl3rcRSRpKOgbST3uIpKqFPT12Fd2jGXqcReRFKegr2X3wSMsyS9icb563EUkHBT0RO9xP6VXZ751/lCmj81mTF/1uItI6krLoFePu4ikk7QJ+upqxzs7DrAkv1aP+yD1uItIuIU66Ctqetz9I/fiUq/H/eyhmdx0/lAuHKUedxEJv9AFvXrcRUROFIqgV4+7iEjdUjro13y4nz+8WsAbW/ZwrEo97iIi0aR00B+
GZTYSlXT/J63CcMVI+7iEhtKR30Ewf34M0fT1GPu4hIPVI66BXwIiIN00lsEZGQU9CLiIScgl5EJOQU9CIiIaegFxEJOQW9iEjIKehFRELOnHNB14CZlQDbIyZ1BQ42cvFMYE/Miwq/pjzHySTouhOx/XhsIx
Mk6mrNsU5dJxyw42TmX1dBMSRH0tZnZXOfcnEbOm+ucy4l3TWHTlOc4mQRddyK2H49txGKdLVlHc5Zt6jLKgrol66mbvwZdQBpI1ec46LoTsf14bCMW62zJOpqzbNCvdWgk5RF9U+i3uIiAsqA+yXpE3xRzgy5ARJKCsqAOKX9ELyIi9QvDEb2IiNRDQS8iEnIKehGRkAtd0JtZJzN73MweMrOZQdcjIsEwsyFm9oiZPRt0LUFLiaA3s0fNrNjM8mpNn25mm82swMxu8Sd/FXjWOTcb+FLCixWRuGlKFjjntjrnZgVTaXJJiaAH5gHTIyeYWWvgD8DngdHAlWY2GugP7PBnq0pgjSISf/NofBaILyWC3jn3OrCv1uQzgQL/t/Yx4GlgBrATL+whRX4+EWmcJmaB+FI5CPvxyZE7eAHfD3gOuNTM7kf/Qi2SDqJmgZn1NLMHgNPM7NZgSksOGUEX0AIWZZpzzpUB1ye6GBEJTF1ZsBe4KdHFJKNUPqLfCQyIuN8f2BVQLSISHGVBA1I56N8GhpnZYDNrC3wdeCngmkQk8ZQFDUiJoDezBcBbwAgz22lms5xzlcB3gMXARmChcy4/yDpFJL6UBc2ji5qJiIRcShzRi4hI8ynoRURCTkEvIhJyCnoRkZBT0IuIhJyCXkQk5BT0ScrMnJn9KeJ+hpmVmNnfGlhuvJldXM/jOWb2uxbWlmVmK83sHTP7bEvWFWtmdreZXRh0HfUxs3lm9rUEbOcyM9toZq/Vmt635hrtDe0vzdhmNzP7drRtSXAU9MmrDBhrZh38+xcBHzViufFA1DeumWU453Kdc
RwtouADY5505zzr3RmAX8S8nGhJnVeY0m59ydzrllsdpWsmni8zgL+LZzbkrkROfcLudczS+aOveXemqo7xpZ3YDjQV9rWxIU55y+kvAL+Bi4B/iaf38+8GPg
79TsCjeP/+/Q7eZVnbAh8CJcBa4ArgJ8BcYAnwFDA5Yh2dgceA9cA64FKgNd41v/P86d+vVdf4WtvoAFzpz5sH/KLWz3A3sBI4N2L6KGBVxP1BwD
9p3+z5Tn113zT33L/efjn8BdwAdAG/+xk4BtQBu/9prnbBvwU2CNX99If3oWsNSf/iCwHcis4zX4L+BdYAXQ259+fBs18/nfJ/v1LQTeA+4FZgK
O0PjVj+AeANf75L/Omtgfv8n38dcGPEel/zX78NUer81PPvP48fA5uB+2rNP8ifN9r+8qn9yl/mOuDPeFeEfRVv33kl4rmtme9p4Ii/vvtqtuU/1p5P9rd3gCkR634OWARsAX4Z8XzMo459UV9NyJOgC9BXHS+M9yYdBzzrv0HWcmJI3wNc5d/u5gdGJ/9N8/uI9fwEWA108O9HruMXwG8i5u0OnA4sjZjWLUptx7cB9PXDIgvvaqivAl/2H3PA5XX8fGuBIf7tHwO3+7d7RMzzJ+CL/u3lwB8jHnssYjtzgF/5t+dxYtB/17/9beBh
vgVv929P9OqMFvYvY/i8jajy+jZrXKuK5PQD0Adrh/QX2U/+x79U81/7yi/D+oh6Gd1Gu9v7PUbONdkAuMNhfbxkwOEqN9T3/y4GcKMsM4pPwPf5aNmK/2lnz+vjbOsm/nQkU4F1F8vi6o2zrB8Bj/u2Rft3t/XVvB
697fjXaSswX1RX4370qmbJOacW4f3RrkSeLnWw1OBW8xsLd4buj0wsI5VveScOxJl+oV4I/PUbG8/3htuiJn9n5lNBw41UOYZwHLnXInzrjnyJHCe/1gV8Jc6llsIXO7fvgJ4xr89xT
vx74HDAmYplnIm4/zCeXo74eL/ijec7/vhrvuQQ4F+/IE+fcImB/HcseA2o+E4lcvj5vO+d2O+fKgffx/pIC74g0cvmFzrlq59wWvOd8JN5reo3/mq4EeuL9IgDvL6APomyvvue/Oe
5Y652oG/TDgHjNbByzDuyZ87wbWfS7eL2+cc5vwAn24/9grzrmDzrmjwAbgZJq+L0odUvl69OniJeB/8I7qekZMN+BS59zmyJnNbGKUdZTVsW7DO2o9zjm338w+A0wDbsYL4xvqqS/atcBrHHXO1TWc4zPAn83sOW+zbouZtQf+iHcUusPMfoIXNJ/6OZxz/zKzQWZ2PtDaOXfCGKIRyv3vVXyyv9dXc6QK5x9K1lq+Ev/zLTMzvFMgtbcHUB1xv5oT32+1LzLl/Lq+65xbHPmAmU2m/tcwlu
yJrmIn3V8TpzrkKM9vGia9VXeuuS+TzVgVkNGNflDroiD75PQrc7ZxbX2v6YuC7ftBgZqf500uBLo1c9xK8q/7hr6O7mWUCrZxzfwHuACY0sI6VwPlmlul/UHgl3nnqejnn3sd7Q9/BJ0fqNUGxx8w6Aw19iDcfWEDdR/N1eRP
wkzm4p3yqoptuGdVgDvs5E2TVwe4DIza2VmQ4EheOfSFwPfMrM2fm3DzaxTA+tp1vMfofb+Utd+VVtXoNgP+Sl4R+DR1hfpdbxfEJjZcLy/FDbXMS/N2BelDgr6JOec2+mc+22Uh36GFzDrzCzPvw/eh3ajzWytmV3RwOp/DnQ3szwzexeYgvcn+HL/T/d5QL1DsDnndvvzvIb3oeUa59yLjfvpeAa4Cu80Ds65A8BDeKc5XsD7QLA+T+KF9IJGbq/GT4GpZrYGb0Dp3XgB1VgP4YXrKqD2kW5jbcYL5H8AN/mnLB7GO22xxn9NH6SBv7pb+PzDp/eXuvar2p4EcswsFy+8N/n17AX+5e9T99Va5o9Aa/+03DPAdf4p
o0aV+UuukyxZKy/F70Gc65q5u4XDugyjlXaWaTgPudc+PjUqRIEtA5eklJZvZ/eEfjzflnn4HAQjNrhfeB6+xY1iaSbHRELyIScjpHLyIScgp6EZGQU9CLiIScgl5EJOQU9CIiIaegFxEJuf8PKzpg4clVSY8AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step size"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"params = [0.1, 0.020, 0.25, 0.1, 1.0]"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
}
],
"source": [
"metrics = [evaluate(train, test, 20, param, 0.0, 'l2', False) for param in params]"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.1, 0.02, 0.25, 0.1, 1.0]\n",
"[nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"print (params)\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEOCAYAAACNY7BQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFIRJREFUeJzt3X2UJXV95/H3hxkFVxIeB4PgOBhI3HEfcNMLMeJKFHAw0eEIETBZJ1nixGyIR3I4Kx7XyIMnB9SEbOLjBAiEk+UhqHGiUUJAdMMxSA8iDCgyQbNMYHVYUBeNssN+94+qlvu73J7p6XuHnof365w+XfWrX1V9773d9alfVfe9qSokSZqxx0IXIEnasRgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqTG4oUuYD4OPPDAWrZs2UKXIUk7lXXr1j1cVUu21m+nDIZly5YxPT290GVI0k4lyT/OpZ+XkiRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJjYkEQ5IVSe5NsiHJOSOW75nkmn75rUmWDS1fmuSxJGdPoh5J0vyNHQxJFgEfAE4ElgOnJ1k+1O0M4NGqOhy4GLhoaPnFwKfHrUWSNL5JjBiOAjZU1f1V9ThwNbByqM9K4Ip++jrglUkCkOQk4H7g7gnUIkka0ySC4RDggYH5jX3byD5VtRn4DnBAkmcDbwPOm0AdkqQJmEQwZERbzbHPecDFVfXYVneSrE4ynWR606ZN8yhTkjQXiyewjY3A8wbmDwUenKXPxiSLgX2AR4CjgVOSvAfYF/h/SX5QVe8f3klVrQHWAExNTQ0HjyRpQiYRDLcBRyQ5DPgn4DTgDUN91gKrgC8ApwA3VVUBL5vpkORc4LFRoSBJevqMHQxVtTnJmcD1wCLgsqq6O8n5wHRVrQUuBa5MsoFupHDauPuVJG0f6U7cdy5TU1M1PT290GVI0k4lybqqmtpaP
zWZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUmEgwJFmR5N4kG5KcM2L5nkmu6ZffmmRZ3358knVJ7uq/v2IS9UiS5m/sYEiyCPgAcCKwHDg9yfKhbmcAj1bV4cDFwEV9+8PAa6rqXwOrgCvHrUeSNJ5JjBiOAjZU1f1V9ThwNbByqM9K4Ip++jrglUlSVV+qqgf79ruBvZLsOYGaJEnzNIlgOAR4YGB+Y982sk9VbQa+Axww1Odk4EtV9cMJ1CRJmqfFE9hGRrTVtvRJ8iK6y0snzLqTZDWwGmDp0qXbXqUkaU4mMWLYCDxvYP5Q4MHZ+iRZDOwDPNLPHwp8HHhjVf3DbDupqjVVNVVVU0uWLJlA2ZKkUSYRDLcBRyQ5LMkzgdOAtUN91tLdXAY4BbipqirJvsCngLdX1S0TqEWSNKaxg6G/Z3AmcD3wFeDaqro7yflJXtt3uxQ4IMkG4HeAmT9pPRM4HHhnkjv6r4PGrUmSNH+pGr4dsOObmpqq6enphS5DknYqSdZV1dTW+vmfz5KkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWpMJBiSrEhyb5INSc4ZsXzPJNf0y29Nsmxg2dv79nuTvGoS9UiS5m/sYEiyCPgAcCKwHDg9yfKhbmcAj1bV4cDFwEX9usuB04AXASuAD
kyQtkEmMGI4CNlTV/VX1OHA1sHKoz0rgin76OuCVSdK3X11VP6yqrwMb+u1JkhbIJILhEOCBgfmNfdvIPlW1GfgOcMAc15UkPY0mEQwZ0VZz7DOXdbsNJKuTTCeZ3rRp0zaWKEmaq0kEw0bgeQPzhwIPztYnyWJgH+CROa4LQFWtqaqpqppasmTJBMqWJI0yiWC4DTgiyWFJnkl3M3ntUJ+1wKp++hTgpqqqvv20/q+WDgOOAL44gZokSfO0eNwNVNXmJGcC1wOLgMuq6u4k5wPTVbUWuBS4MskGupHCaf26dye5FrgH2Az8VlU9MW5NkqT5S3fivnOZmpqq6enphS5DknYqSdZV1dTW+vmfz5KkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkxljBkGT/JDckua
vt8s/Vb1fe5Lsqpv+xdJPpXkq0nuTnLhOLVIkiZj3BHDOcCNVXUEcGM/30iyP/Au4GjgKOBdAwHyvqp6IfBi4KVJThyzHknSmMYNhpXAFf30FcBJI/q8Crihqh6pqkeBG4AVVfX9qvosQFU9DtwOHDpmPZKkMY0bDM+pqocA+u8HjehzCPDAwPzGvu1HkuwLvIZu1CFJWkCLt9Yhyd8CPzFi0TvmuI+MaKuB7S8GrgL+qKru30Idq4HVAEuXLp3jriVJ22qrwVBVx822LMk3kxxcVQ8lORj41ohuG4FjB+YPBW4emF8D3FdVf7iVOtb0fZmamqot9ZUkzd+4l5LWAqv66VXAJ0b0uR44Icl+/U3nE/o2krwb2Ad465h1SJImZNxguBA4Psl9wPH9PEmmklwCUFWPABcAt/Vf51fVI0kOpbsctRy4PckdSX59zHokSWNK1c53VWZqaqqmp6cXugxJ2qkkWVdVU1vr538+S5IaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqTFWMCTZP8kNSe7rv+83S79VfZ/7kqwasXxtkvXj1CJJmoxxRwznADdW1RHAjf18I8n+wLuAo4GjgHcNBkiS1wGPjVmHJGlCxg2GlcAV/fQVwEkj+rwKuKGqHqmqR4EbgBUASfYGfgd495h1SJImZNxgeE5VPQTQfz9oRJ9DgAcG5jf2bQAXAL8PfH/MOiRJE7J4ax2S/C3wEyMWvWOO+8iItkpyJHB4VZ2VZNkc6lgNrAZYunTpHHctSdpWWw2GqjputmVJvpnk4Kp6KMnBwLdGdNsIHDswfyhwM/AS4GeSfKOv46AkN1fVsYxQVWuANQBTU1O1tbolSfMz7qWktcDMXxmtAj4xos/1wAlJ9utvOp8AXF9VH6qq51bVMuAY4GuzhYIk6ekzbjBcCByf5D7g+H6eJFNJLgGoqkfo7iXc1n+d37dJknZAqdr5rspMTU3V9PT0QpchSTuVJOuqamp
fzPZ0lSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSI1W10DVssySbgG8D35nH6gcCD0+2Im3BPszvddqR7aiPaaHq2t77nfT2J7W9cbYz33XHPX49v6qWbK3TThkMAEnWVNXqeaw3XVVT26MmPdV8X6cd2Y76mBaqru2930lvf1LbG2c7O
xa2e+lPRXC12A5mRXfJ121Me0UHVt7/1OevuT2t4429lRf4aAnXjEMF+OGCTtrBwxbD9rFroASZqnp+X4tduNGCRJW7Y7jhgkSVtgMEiSGgaDJKmx2wdDkmcnuSLJnyT55YWuR5LmKskLklya5LpJbneXDIYklyX5VpL1Q+0rktybZEOSc
m1wHXVdWbgNc+7cVK0oBtOX5V1f1Vdcaka9glgwG4HFgx2JBkEfAB4ERgOXB6kuXAocADfbcnnsYaJWmUy5n78Wu72CWDoao+Dzwy1HwUsKFP2MeBq4GVwEa6cIBd9PmQtPPYxuPXdrE7HQgP4cmRAXSBcAjwMeDkJB9iB/83dUm7rZHHryQHJPkw8OIkb5/UzhZPakM7gYxoq6r6HvBrT3cxkrQNZjt+/W/gzZPe2e40YtgIPG9g/lDgwQWqRZK2xdN6/NqdguE24IgkhyV5JnAasHaBa5KkuXhaj1+7ZDAkuQr4AvDTSTYmOaOqNgNnAtcDXwGuraq7F7JOSRq2Ixy/fBM9SVJjlxwxSJLmz2CQJDUMBklSw2CQJDUMBklSw2CQJDUMhl1Ukkpy5cD84iSbknxyK+sdmeTVW1g+leSPxqxtSZJbk3wpycvG2dakJTk/yXELtO9vJDlwAfb73iR3J3nvHPouS/KG7VzPm5O8cXvuQ1u2O71X0u7me8C/SvKsqvpn4Hjgn+aw3pHAFPDXwwuSLK6qaWB6zNpeCXy1qlbNdYUki6pqIm+L3j+OzaOWVdXvTmIfO5nfAJZU1Q/n0HcZ8Abgv2+vYqrqw9tr25obRwy7tk8Dv9BPnw5cNbOg/+S6y5Lc1p+5r+z/1f584NQkdyQ5Ncm5SdYk+Rvgz5IcOzPqSLJ3kj9NcleSO5OcnGRRksuTrO
zxosKMmRwHuAV/f7eFaS0/u+65NcNND3sf4M/lbgJQPt/zLJFwfmlyW5s5/+3f4xre
Tt9+c5LfS/I54B1Jvp7kGf2yH+/P1p/R135K3/6NJOclub2v74V9+5IkN/TtH0nyj8Nn+kl+M8l7BuZ/Nckf99N/mWRdf5a+evhF6x/P+oH5s5Oc20
ZJLP9Ov/j4Gafql/zF9O8vkR20w/Mph5XU7t29cCzwZunWkbWOfl/Wt0R/8z8mPAhcDL+raz+tf7vf1zfmeS3+jXPTbJ55N8PMk9ST6c5CnHmyQX9svvTPK+vu3c/jE/d2D/dyR5Isnz++f/o/0+b0vy0uHtakxV5dcu+AU8Bvwb4DpgL+AO4Fjgk/3y3wN+pZ/eF/ga3QHiV4H3D2znXGAd8Kx+fnAbFwF/ONB3P+BngBsG2vYdUduP9gE8F/ifwBK6EexNwEn9sgJeP8vjuwN4QT/9NuC/9tP7D/S5EnhNP30z8MGBZX86sJ/VwO/305cDp/TT3wB+u5/+z8Al/fT7gbf30yv6Og8cqm8J3fvnz8x/GjhmsEbgWcB64ICB/R1Id1a+fmDds4Fz++kbgSP66aOBm
pu4BDtvCcnwzcACwCntM/5wfP/KzM8hz/FfDSfnrv/vX50es/8NzNPPd70o0mD+v7/QB4Q
PG2ae14F19wfu5cl3YNh34Gfu7KG+v0X3NhDQjVZmnsulwFcW+vdtV/tyxLALq6o76Q4yp/PUS0MnAOckuYPuoLkX3S/ZKGuruxw17Di6T5Wa2d+jwP3AC5L8cZIVwHe3Uua/B26uqk3VXd75c+A/9MueAD46y3rXAq/vp08Frumnfz7d/Yu7gFcALxpY55qB6Ut48u3Wf40uKEb5WP99Hd1zCXAM3QelUFWfAR4dXqmqNgH3J/nZJAcAPw3c0i9+S5IvA39P946ZR8yy70aSvYGfA/6if90+AhzcL74FuDzJm+gOxMOOAa6qqieq6pvA5+ie+y25BfiDJG+hO2iPuvx2AvDGvp5bgQMGHs8Xq/tgmSfoRqvHDK37X
wuCTJ64Dvz/K4Xwr8OvCf+qbjgPf3+1wL/Hg/mtGEeI9h17cWeB/dGdwBA+0BTq6qewc7Jzl6xDa+N8u2Q3e2/CNV9WiSfwu8iu4s7/U8+Qs92zZm84Oa
7CNXQHyI91u637kuwFfBCYqqoH+ssve416HFV1S3/J5uXAoqpqPl93wMx19yd48vdlSzUP1/h64KvAx6uqkhxLd2B7SVV9P8nNQzUCbKa9zDuzfA/g21V15PCOqurN/Wv3C8AdSY6s7r36Z8y15sFtXpjkU8Crg
P6JvyoRtVXd80do9z+I3Yhn9WNic5iu6e02l0bxL3iqHtHAxcCry2qh7rm/ege/5GnaxoAhwx7PouA86vqruG2q8HfnvgGvyL+
A8z17Otv6H6Z6bexX3+tfY+q+ijwTuDfbWUbtwIvT3Jgus+1PZ3ubHaLquof6A7W7+TJkcDMAfTh/uz6lK1s5s/ozmRnGy3M5u/oRytJTqC7hDbKx4CT6B7TTI37AI/2ofBC4GdHrPdN4KB0n861J/CLAFX1XeDrSX6p33f6ECbJT1bVrdXdPH+Y9r37AT5Pd+9oUZIldKOyL7IF/TbvqqqL6C4RvZCn/nxcD/xmnrxf81NJnt0vOyrd20TvQTeq+7uh7e8N7FNVfw28le4PHwaXP4NuZPi2qvrawKLhn7unBKXGYzDs4qpqY1X9txGLLgCeAdzZ3+i8oG
LLC8v9l36oj1Br0b2G/mpifw83QfQXhzP8y/HNjixw1W1UN9n88CXwZur6pPzO3RcQ3wK3QHD6rq28Cf0F1v/0u697Dfkj+nO6hftZV+w84DTkhyO92Hsz9Ed8Bs9JfW7gGeX1UzB+HPAIvT3Sy/gO5y0vB6/5fujwBuBT5JN+KY8cvAGf3zfTdPfu7ve/ubyuvpQuDLQ5v9OHBn334T8F+q6n9t5XG+deC1/We6+yR3Apv7m9xn0V2Suwe4vd/3R3hyZPUFupvV64Gv9zUM+jHgk/1z8TngrKHlP0d3ueu8gRvQzwXeAkz1N6zvYTt8gtnuzrfd1m4r3V8frayq/7iN6+0JPNFfCnkJ8KFRl3d2Z/2lpLOr6hcXuhZtO+8xaLeU7k9HT6S7fr6tlgLX9pdIHgfeNMnapIXmiEGS1PAegySpYTBIkhoGgySpYTBIkhoGgySpYTBIkh
H9PYhXnKZ6AqAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying values of step size')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L2 regularization"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"[nan, nan, nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEOCAYAAACNY7BQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFJ1JREFUeJzt3X20ZXV93/H3R0bBaMLjYJBhHCws7ZgHrCeobawkPDjY6rACFWhihpRkVlJJ22S5Kq5oUDSJaLKwKpqMiE5owkOJ1okmQYKSrvqA3FEKjBEZEcsIjWOZ0KBGMuTbP/YeOL
uQ9zzxnuvcz7tdZZdz/89m9/9z777s/Z+9xzbqoKSZL2eNJiFyBJWloMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSY8ViF7AQRxxxRK1Zs2axy5CkZWXr1q3fqqqVc7VblsGwZs0apqamFrsMSVpWknx9Pu28lSRJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJakwkGJKsS3Jnku1JLhwx/8Ak1/Tzb06yZtr81UkeSvLaSdQjSVq4sYMhyQHAZcDpwFrg3CRrpzU7H9hVVccBlwKXTJt/KfDn49YiSRrfJK4YTgS2V9XdVfUwcDWwflqb9cDmfvg64OQkAUhyBnA3sG0CtUiSxjSJYDgauHdofEc
WSbqtoNPAgcnuRpwOuAN0+gDknSBEwiGDJiWs2zzZuBS6vqoTlXkmxMMpVkaufOnQsoU5I0Hysm0McO4Jih8VXAfTO02ZFkBXAw8ADwQuCsJG8HDgH+McnfV9V7pq+kqjYBmwAGg8H04JEkTcgkguEW4PgkxwLfAM4B/u20NluADcBngbOAT1ZVAS/Z0yDJm4CHRoWCJOnxM3YwVNXuJBcA1wMHAFdU1bYkFwNTVbUF+ABwZZLtdFcK54y7XknSvpHuhfvyMhgMampqarHLkKRlJcnWqhrM1c5PPkuSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGhMJhiTrktyZZHuSC0fMPzDJNf38m5Os6aefmmRrktv7nz89iXokSQs3djAkOQC4DDgdWAucm2TttGbnA7uq6jjgUuCSfvq3gFdU1Y8CG4Arx61HkjSeSVwxnAhsr6q7q+ph4Gpg
Q264HN/fB1wMlJUlVfrKr7+unbgIOSHDiBmiRJCzSJYDgauHdofEc
WSbqtoNPAgcPq3NmcAXq+p7E6hJkrRAKybQR0ZMq71pk+R5dLeXTptxJclGYCPA6tWr975KSdK8TOKKYQdwzND4KuC+mdokWQEcDDzQj68CPgL8fFV9daaVVNWmqhpU1WDlypUTKFuSNMokguEW4PgkxyZ5CnAOsGVamy10by4DnAV8sqoqySHAx4HXV9WnJ1CLJGlMYwdD/57BBcD1wF8D11bVtiQXJ3ll3+wDwOFJtgO/Duz5k9YLgOOANya5tX8cOW5NkqSFS9X0twOWvsFgUFNTU4tdhiQtK0m2VtVgrnZ+8lmS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEmNiQRDknVJ7kyyPcmFI+YfmOSafv7NSdYMzXt9P/3OJC+bRD2SpIUbOxiSHABcBpwOrAXOTbJ2WrPzgV1VdRxwKXBJv+xa4BzgecA64L19f5KkRTKJK4YTge1VdXdVPQxcDayf1mY9sLkfvg44OUn66VdX1feq6mvA9r4/SdIimUQwHA3cOzS+o582sk1V7QYeBA6f57KSpMfRJIIhI6bVPNvMZ9mug2RjkqkkUzt37tzLEiVJ8zWJYNgBHDM0vgq4b6Y2SVYABwMPzHNZAKpqU1UNqmqwcuXKCZQtSRplEsFwC3B8kmOTPIXuzeQt09psATb0w2cBn6yq6qef0
V0rHA8cDnJ1CTJGmBVozbQVXtTnIBcD1wAHBFVW1LcjEwVVVbgA8AVybZTnelcE6/7LYk1wJfAnYDr6mqR8atSZK0cOleuC8vg8GgpqamFrsMSVpWkmytqsFc7fzksySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpMVYwJDksyQ1J7up/HjpDuw19m7uSbOin/UCSjyf5cpJtSd42Ti2SpMkY94rhQuDGqjoeuLEfbyQ5DLgIeCFwInDRUID8blU9F3g+8C+SnD5mPZKkMY0bDOuBzf3wZuCMEW1eBtxQVQ9U1S7gBmBdVX2nqj4FUFUPA18AVo1ZjyRpTOMGwzOq6n6A/ueRI9ocDdw7NL6jn/aoJIcAr6C76pAkLaIVczVI8pfAD4+Y9RvzXEdGTKuh/lcAVwHvqqq7Z6ljI7ARYPXq1fNctSRpb80ZDFV1ykzzkvxNkqOq6v4kRwHfHNFsB3DS0Pgq4Kah8U3AXVX1zjnq2NS3ZTAY1GxtJUkLN+6tpC3Ahn54A/DREW2uB05Lcmj/pvNp/TSSvBU4GPhPY9YhSZqQcYPhbcCpSe4CTu3HSTJIcjlAVT0AvAW4pX9cXFUPJFlFdztqLfCFJLcm+cUx65EkjSlVy++uzGAwqKmpqcUuQ5KWlSRbq2owVzs/+SxJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqTGWMGQ5LAkNyS5q/956AztNvRt7kqyYcT8LUnuGKcWSdJkjHvFcCFwY1UdD9zYjzeSHAZcBLwQOBG4aDhAkvwM8NCYdUiSJmTcYFgPbO6HNwNnjGjzMuCGqnqgqnYBNwDrAJI8Hfh14K1j1iFJmpBxg+EZVXU/QP/zyBFtjgbuHRrf0U8DeAvwe8B3xqxDkjQhK+ZqkOQvgR8eMes35rmOjJhWSU4AjquqX0uyZh51bAQ2AqxevXqeq5Yk7a05g6GqTplpXpK/SXJUVd2f5CjgmyOa7QBOGhpfBdwEvBh4QZJ7+jqOTHJTVZ3ECFW1CdgEMBgMaq66JUkLM+6tpC3Anr8y2gB8dESb64HTkhzav+l8GnB9Vb2vqp5ZVWuAnwS+MlMoSJIeP+MGw9uAU5PcBZzaj5NkkORygKp6gO69hFv6x8X9NEnSEpSq5XdXZjAY1NTU1GKXIUnLSpKtVTWYq52ffJYkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNVJVi13DXkuyE/hb4MEFLH4E8K3JVqRZHMzCnqelbKlu02LVta/XO+n+J9XfOP0sdNlxz1/PqqqVczValsEAkGRTVW1cwHJTVTXYFzXp+y30eVrKluo2LVZd+3q9k+5/Uv2N089SP38t51tJf7rYBWhenojP01LdpsWqa1+vd9L9T6q/cfpZqscQsIyvGBbKKwZJy5VXDPvOpsUuQJIW6HE5f+13VwySpNntj1cMkqRZGAySpIbBIElq7PfBkORpSTYneX+Sn13seiRpvpI8O8kHklw3yX6fkMGQ5Iok30xyx7Tp65LcmWR7kgv7yT8DXFdVvwS88nEvVpKG7M35q6rurqrzJ13DEzIYgA8B64YnJDkAuAw4HVgLnJtkLbAKuLdv9sjjWKMkjfIh5n/+2ieekMFQVf8DeGDa5BOB7X3CPgxcDawHdtCFAzxB94ek5WMvz1/7xP50Ijyax64MoAuEo4EPA2cmeR9L/GPqkvZbI89fSQ5P8vvA85O8flIrWzGpjpaBjJhWVfVt4Bce72IkaS/MdP76v8AvT3pl+9MVww7gmKHxVcB9i1SLJO2Nx/X8tT8Fwy3A8UmOTfIU4BxgyyLXJEnz8biev56QwZDkKuCzwHOS7EhyflXtBi4Argf+Gri2qrYtZp2SNN1SOH/5JXqSpMYT8opBkrRwBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwLDFJKsmVQ+MrkuxM8rE5ljshyctnmT9I8q4xa1uZ5OYkX0zyknH6mrQkFyc5ZUJ93ZPkiEn0Nck+kzw3ya39/v8nc/Wf5GeT3NY/PpPkx8dZ/0IsZLuTXL6Qbw5Ncl6SZ47bj/av70paLr4N/EiSp1bVd4FTgW/MY7kTgAHwZ9NnJFlRVVPA1Ji1nQx8uao2zHeBJAdU1US+zrzfjt2j5lXVb05iHUvcGcBHq+qie
GvDSqtqV5HRgE/DC2RaY5PO1EP36f3GBi58H3EH/VRFj9KOq8rGEHsBDwG8DZ/Xjfwi8DvhYP/404Aq6j8h/ke6rd58C/G9gJ3ArcDbwJroTwSeAPwZOGurj6cAHgduB24AzgQPovgf+jn76r02r64Rp63gqcG7f9g7gkmnbcDFwM/CTQ9P/KfD5ofE1wG398G/223RHX/eeD1/e1O+PvwIuojvZPbmf90PAPcCT+9r37LN7gDcDX+jre24/fSVwQz/9D4CvA0eMeA7u2TMd+Dng8/02/0G/n34FePtQ+/OAd8/UfrjP/vn7OPC/+m09e8T6TwA+1z83HwEOBV4O/B+6Fwmfmq3mGY6rQ4FvzHLMPfp8AS/o9/dWuk/aHtW3+4m+ps8C7wDuGNr+9wz19zHgpBH78
3fW4DNs6y/pvoXuS8st+PtwJ3Al+b6VgBzur7uZPHjs+bgEG/zGzH6m/1z8fngGcs9jlgKTwWvQAf056Q7kD9MeA64KD+ID+Jx07qvw38XD98CPCV/mQz/ZfzTf0v4VP78eE+LgHeOdT20P5kcMPQtENG1PboOoBn0gXFSrorz08CZ/TzCnjVDNt3K/Dsfvh1wBv64cOG2lwJvKIfvgl479C8Dw6tZyPwe/3wh2iD4Vf74X8PXN4Pvwd4fT+8rq9zxmCgC7I/5bEgei/w8/02bx9q/+d0J7SR7af1eSbw/qFlDx6x/tvoXulDd8J859Bz+toZ9us9o7ZlaP5r9+yHEfMefb7oQvYzwMp+/Gzgin74DuCf98NvY++D4bD+51P7vg4fdbwwdEIfmnYt8Jp5HCuD6f0w97G6Z/m30x+P+/vD9xiWoKq6je7V9Ll8/62h04ALk9xKd+AfBKyeoast1d2Omu4Uuv8GtWd9u4C7gWcneXeSdcD/m6PMnwBuqqqd1d3e+SPgX
zHgH+ZIblrgVe1Q+fDVzTD/9U
7F7cBPA88bWuaaoeHLeexr0n+BLihG+XD/cyvdvoTu5H01QFX9BbBrpo3rnUwXmLf0+/tkulDbCdyd5EVJDgeeA3x6pvbT+rwdOCXJJUleUlUPDs9McjBdKP9VP2kzj+3XBUnyU8D5dEE8yvDz9RzgR4Ab+m14A7AqySHAD1bVZ/p2f7yAUv5Dkj2vzI8Bjh+x/lH1/2fgu1W155id7VgZZbZj9WG6IIP2WNmv+R7D0rUF+F26V/qHD00PcGZV3TncOMmoe8ffnqHv0L1SelR196F/HHgZ8Bq6k/e/m6W+Ud8Pv8ff18z3qa8B/luSD3e
uSHET36npQVfcmeRNd4H3fdlTVp5OsSfJSuts0zf/FHfK9/ucjPHacz1bzKAE2V9Wof4ByDd0++jLwkaqqJLO131P/V5K8gO7W0O8k+URVXbyXdc1bkh+jC9PTq/vu/lGGn68A26rqxdP6OXSW1eym/UOWg6Y3SHIS3QuSF1fVd5LcNNRuxuMlycnAv6E/kc/jWBnZzSzz/qH6ywXaY2W/5hXD0nUFcHFV3T5t+vXA
YnIZI8v5/+d8APzrPvT9B9UyN9H4f2fznypKr6E+CNwD+bo4+bgZcmOaL/f7Tn0t2XnlVVfZXuF/CNPHYlsOcX+1tJnk53v3g2fwhcxcxXCzP5n/RXK0lOo7uFNpsbgbOSHNkvc1iSZ/XzPkz3ZvC5PLYds7Wnn/ZM4DtV9V/pg
Zz/0VxK6hv/p6NfPYr6MkWd3X+eqq+so8F7sTWJnkxX0fT07yvP6q8u+SvKhvd87QMvcAJyR5UpJj6P4N5XQHA7v6UHgu8KIRbabX/yy6EHjV0JXvbMfKTL8DCzpW92em4xJVVTuA/zJi1luAdwK39eFwD/CvgU/x2C2m35mj+7cClyW5g+4k/Wbgq8AHk+x5sTDrvwmsqvvT/SvBT9G9IvuzqvrofLaN7kT6DuDYvq+/TfJ+utss99C9sTibP+q34ap5rm+PNwNXJTmb7sRwP93JZKSq+lKSNwCf6PfLP9BdTX29v8L6ErC2qj4/V/uhbn8UeEeSf+zn/8qIVW8Afj/JD9Dd4pvvfxi8re8Xult2P0R3tfne/nXE7qoazNZBVT2c5CzgXf1trRV0x9s2uttR70/y
mHtug32a7o8C9ry5+4URXf8F8MtJbqMLn8/NY3vO6+v/SF
fVX18lmOlQ/R7bfvAo9e8Yx5rO6X/NptLTv9iWt9Vb16L5c7EHikqnb3r4jfV1Un7JMin4CSPL2qHuqHL6T7a6X/uMhlaR/wikHLSpJ3A6fT3aPfW6uBa/tX8w8DvzTJ2vYD/6p/5b2C7irovMUtR/uKVwySpIZvPkuSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKnx/wE1BT111UnhvQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L2 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L1 regularization"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"[nan, nan, nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l1', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"L1 (1.0) number of zero weights: 6\n",
"L1 (10.0) number of zeros weights: 6\n",
"L1 (100.0) number of zeros weights: 6\n"
]
}
],
"source": [
"model_l1 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=1.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_10 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=10.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_100 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=100.0, regType='l1', intercept=False)\n",
"\n",
"print (\"L1 (1.0) number of zero weights: \" + str(sum(model_l1.weights.a
ay == 0)))\n",
"\n",
"print (\"L1 (10.0) number of zeros weights: \" + str(sum(model_l1_10.weights.a
ay == 0)))\n",
"\n",
"print (\"L1 (100.0) number of zeros weights: \" + str(sum(model_l1_100.weights.a
ay == 0)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intercept"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mlli
egression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/li
python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, True]\n",
"[nan, nan]\n"
]
}
],
"source": [
"params = [False, True]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, 1.0, 'l2', param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEKCAYAAAAW8vJGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFX1JREFUeJzt3X+0ZWV93/H3xxkFfyQwwIDIOBkUEjtEi/UuXNrowh8gmCgspRGT1DHVktVoarR2BWsTAV2JmFisv5JM1EhtEjBaV6aSFBBFiU2VO4A/RkXGEcMIVSyUVbRi0W
2M+V89ycy71zz7lzufB+rXXW2fvZz977+5w7cz9n733PPqkqJEma86DVLkCSdN9iMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKmzfrULWI7DDjustmzZstplSNKasnPnzu9U1cbF+q3JYNiyZQuzs7OrXYYkrSlJvrGUfp5KkiR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUmcqwZDklCTXJ9md5Owxyw9IcnF
pkkW+Yt35zkziSvnUY9kqTlmzgYkqwD3gWcCmwFXpxk67xuLwNur6pjgAuA8+ctvwD4m0lrkSRNbhpHDCcAu6tqT1X9ALgIOG1en9OAC9v0h4BnJQlAktOBPcCuKdQiSZrQNILhKOCmkfm9rW1sn6q6G7gDODTJw4HfAs6dQh2SpCmYRjBkTFstsc+5wAVVdeeiO0nOSjKbZPbWW29dRpmSpKVYP4Vt7AUePTK/Cbh5gT57k6wHDgJuA54MnJHkLcDBwI+SfL+q3jl/J1W1HdgOMDMzMz94JElTMo1guBo4NsnRwDeBM4FfmtdnB7AN+DvgDODjVVXA0+Y6JDkHuHNcKEiS9p+Jg6Gq7k7ySuBSYB3wvqraleQ8YLaqdgDvBT6QZDfDkcKZk+5XkrQyMrxxX1tmZmZqdnZ2tcuQpDUlyc6qmlmsn598liR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1phIMSU5Jcn2S3UnOHrP8gCQXt+WfSbKltZ+UZGeSL7TnZ06jHknS8k0cDEnWAe8CTgW2Ai9OsnVet5cBt1fVMcAFwPmt/TvA86rq8cA24AOT1iNJmsw0jhhOAHZX1Z6q+gFwEXDavD6nARe26Q8Bz0qSqrq2qm5u7buAA5McMIWaJEnLNI1gOAq4aWR+b2sb26eq7gbuAA6d1+eFwLVVddcUapIkLdP6KWwjY9pqX/okOY7h9NLJC+4kOQs4C2Dz5s37XqUkaUmmccSwF3j0yPwm4OaF+iRZDxwE3NbmNwEfAV5SVV9baCdVtb2qZqpqZuPGjVMoW5I0zjSC4Wrg2CRHJ3kIcCawY16fHQwXlwHOAD5eVZXkYOAS4HVV9ekp1CJJmtDEwdCuGbwSuBT4MvDBqtqV5Lwkz2/d3gscmmQ38Bpg7k9aXwkcA/x2kuva4/BJa5IkLV+q5l8OuO+bmZmp2dnZ1S5DktaUJDuramaxfn7yWZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSZ2pBEOSU5Jcn2R3krPHLD8gycVt+WeSbBlZ9
Wfn2S50yjHknS8k0cDEnWAe8CTgW2Ai9OsnVet5cBt1fVMcAFwPlt3a3AmcBxwCnAu9v2JEmrZBpHDCcAu6tqT1X9ALgIOG1en9OAC9v0h4BnJUlrv6iq7qqqrwO72/YkSatkGsFwFHDTyPze1ja2T1XdDdwBHLrEdSVJ+9E0giFj2mqJfZay7rCB5Kwks0lmb7311n0sUZK0VNMIhr3Ao0fmNwE3L9QnyXrgIOC2Ja4LQFVtr6qZqprZuHHjFMqWJI0zjWC4Gjg2ydFJHsJwMXnHvD47gG1t+gzg41VVrf3M9ldLRwPHAp+dQk2SpGVaP+kGquruJK8ELgXWAe+rql1JzgNmq2oH8F7gA0l2MxwpnNnW3ZXkg8CXgLuBV1TVDyetSZK0fBneuK8tMzMzNTs7u9plSNKakmRnVc0s1s9PPkuSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOhMFQ5JDklye5Ib2vGGBfttanxuSbGttD0tySZKvJNmV5M2T1CJJmo5JjxjOBq6oqmOBK9p8J8khwBuAJwMnAG8YCZA/qKrHAU8E/mmSUyesR5I0oUmD4TTgwjZ9IXD6mD7PAS6vqtuq6nbgcuCUqvpeVX0CoKp+AFwDbJqwHknShCYNhiOq6haA9nz4mD5HATeNzO9tbT+W5GDgeQxHHZKkVbR+sQ5JPgY8csyi1y9xHxnTViPbXw/8BfD2qtpzL3WcBZwFsHnz5iXuWpK0rxYNhqp69kLLknwryZFVdUuSI4Fvj+m2FzhxZH4TcOXI/Hbghqp62yJ1bG99mZmZqXvrK0lavklPJe0AtrXpbcBfjelzKXBykg3tovPJrY0kbwIOAn5zwjokSVMyaTC8GTgpyQ3ASW2eJDNJ3gNQVbcBbwSubo/zquq2JJsYTkdtBa5Jcl2Sl09YjyRpQqlae2dlZmZmanZ2drXLkKQ1JcnOqppZrJ+ffJYkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVJnomBIckiSy5Pc0J43LNBvW+tzQ5JtY5bvSPLFSWqRJE3HpEcMZwNXVNWxwBVtvpPkEOANwJOBE4A3jAZIkhcAd05YhyRpSiYNhtOAC9v0hcDpY/o8B7i8qm6rqtuBy4FTAJI8AngN8KYJ65AkTcmkwXBEVd0C0J4PH9PnKOCmkfm9rQ3gjcBbge9NWIckaUrWL9YhyceAR45Z9Pol7iNj2irJ8cAxVfXqJFuWUMdZwFkAmzdvXuKuJUn7atFgqKpnL7QsybeSHFlVtyQ5Evj2mG57gRNH5jcBVwJPAZ6U5MZWx+FJrqyqExmjqrYD2wFmZmZqsbolScsz6amkHcDcXxltA/5qTJ9LgZOTbGgXnU8GLq2qP6yqR1XVFuDngK8uFAqSpP1n0mB4M3BSkhuAk9o8SWaSvAegqm5juJZwdXuc19okSfdBqVp7Z2VmZmZqdnZ2tcuQpDUlyc6qmlmsn598liR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1UlWrXcM+S3Ir8I3VrmMfHQZ8Z7WL2M8c8wODY147fqqqNi7WaU0Gw1qUZLaqZla7jv3JMT8wOOb7H08lSZI6BoMkqWMw7D
V7uAVeCYHxgc8/2M1xgkSR2PGCRJHYNhipIckuTyJDe05w0L9NvW+tyQZNuY5TuSfHHlK57cJGNO8rAklyT5SpJdSd68f6vfN0lOSXJ9kt1Jzh6z/IAkF7fln0myZWTZ61r79Umesz
nsRyx5zkpCQ7k3yhPT9zf9e+HJP8jNvyzUnuTPLa/VXziqgqH1N6AG8Bzm7TZwPnj+lzCLCnPW9o0xtGlr8A+HPgi6s9npUeM/Aw4Bmtz0OAq4BTV3tMC4xzHfA14DGt1s8BW+f1+XXgj9r0mcDFbXpr638AcHTbz
VHtMKj/mJwKPa9M8C31zt8azkeEeWfxj4S+C1qz2eSR4eMUzXacCFbfpC4PQxfZ4DXF5Vt1XV7cDlwCkASR4BvAZ4036odVqWPeaq+l5VfQKgqn4AXANs2g81L8cJwO6q2tNqvYhh7KNGX4sPAc9KktZ+UVXdVVVfB3a37d3XLXvMVXVtVd3c2ncBByY5YL9UvXyT/IxJcjrDm55d+6neFWMwTNcRVXULQHs+fEyfo4CbRub3tjaANwJvBb63kkVO2aRjBiDJwcDzgCtWqM5JLTqG0T5VdTdwB3DoEte9L5pkzKNeCFxbVXetUJ3TsuzxJnk48FvAufuhzhW3frULWGuSfAx45JhFr1/qJsa0VZLjgWOq6tXzz1uutpUa88j21wN/Aby9qvbse4X7xb2OYZE+S1n3vmiSMQ8Lk+OA84GTp1jXSplkvOcCF1TVne0AYk0zGPZRVT17oWVJvpXkyKq6JcmRwLfHdNsLnDgyvwm4EngK8KQkNzL8XA5PcmVVncgqW8Exz9kO3FBVb5tCuStlL/DokflNwM0L9Nnbwu4g4LYlrntfNMmYSbIJ+Ajwkqr62sqXO7FJxvtk4IwkbwEOBn6U5PtV9c6VL3sFrPZFjvvTA/h9+guxbxnT5xDg6wwXXze06UPm9dnC2rn4PNGYGa6nfBh40GqPZZFxrmc4f3w091yYPG5en1fQX5j8YJs+jv7i8x7WxsXnScZ8cOv/wtUex/4Y77w+57DGLz6vegH3pwfDudUrgBva89wvvxngPSP9/gXDBcjdwK+O2c5aCoZlj5nhHVkBXwaua4+X
aY7mWszwW+yvCXK69vbecBz2/TBzL8Rcpu4LPAY0bWfX1b73ruo395Nc0xA/8e+O7Iz/U64PDVHs9K/oxHtrHmg8FPPkuSOv5VkiSpYzBIkjoGgySpYzBIkjoGgySpYzDcjyWpJB8YmV+f5NYkH11kveOTPPdels8kefs0ax2zj+fP3d0yyelJto4suzLJVL5vN8m/m8Z2Ftj2jUkOW8Z675kb72h9S
sj7vuJvnrdouSe+vz0iSPWula7qv7v78zGO7fvgv8bJKHtvmTgG8uYb3jGf6e+x9Isr6qZqvqX0+pxrGqakdVzd2G+3SGO5SuhBULhuWqqpdX1Zfa7H6vr6qeW1X/e5FuLwX26Rdz+6TwtOzz
V0BsP9398AP9+mX8xwTyIAkjw8yfuSXJ3k2iSnJXkIwwd6XpTkuiQvSnJOku1JLgP+U5IT5446kjwiyZ+2++5/PskLk6xL8v4kX2ztrx4tqC3fk8HBSX6U5Olt2VVJjmnvCN+Z5KnA84Hf
U8tm3mnyX5bJKvJnlaW/fAkVquTfKM1v7SJO8c2f9H2xjeDDy0bffP5r9wSf4wyWyG74o4d6T9xiTnJrmm7etxrf3QJJe1ff8xY+6rk+QXk/yHNv2qJHva9GOT/G2bvrIdlY2
12SP2k1XTYS+qP7eF6G7wq4NsnHkhzR2s9pP+8r2+s/NtznjnTaEcqX5+8vyRkMH2D8s1
Q5M8KcknM3z3wqUZbo8yN5bfTfJJ4FVJjkjykSSfa4+ntn6/0n6e1yX54yTrWvudSd7aXusrkmwct/9x49AEVvsTdj5W7gHcCTyB4fbABzJ8+vRE4KNt+e8Cv9KmD2b4xOfDGd6NvXNkO+cAO4GHtvnRbZwPvG2k7wbgSQy32Z5rO3hMbf+N4VYRvwBczfDJ4AOAr7flP64BeD9wxsi6VwJvbdPPBT7Wpv8N8Kdt+nHA37dxzx/PR4ET516je3n95j7Fva7t8wlt/kbgN9r0r9M+4Q28HfidNv3zDJ/qPmzeNh8JXN2mP9TGfhSwDfi9kfHNzK+P4RPxdwPHt/kPzv385u1jA/d8be/LR16rc4D/3l7nw4D/BTx4zPo3tuUL7m9ejQ9u293Y5l8EvG+k37tHtn0x8Jsjr+tBwD8C/utcLcC7Ge6vRHsNf7lN/87Iv4kf79/H9B/eRO9+rqo+n+FurS8G/nre4pOB5+eeb5s6ENi8wKZ2VNX/HdP+bIZ7xszt7
2LvgxSd4BXAJcNma9q4CnM9yX5veAfwl8kuEX5VL8l/a8k+EXGMDPAe9odXwlyTeAn17i9sb5xSRnMdxD50iG01mfH7P/F7Tpp89NV9UlSW6fv8Gq+p/tKOsnGG7G9udtvaeNbPPefL2qrhvZ95YxfTYBF7d37Q9huDfVnEtquP31XUm+DRzBcGO4Sfb3MwxfxnN5hjuLrgNuGVl+8cj0M4GXAFTVD4E7kvxzhjcTV7f1H8o9N2P80cj6/5mlvUaakKeSHhh2AH/AyGmkJgw3OTu+PTZX1ZcX2MZ3F2gP825NXMOX8fxjhnd1rwDeM2a9qxh+GZ7AEFgHMxyJfGqxwTRz9
IffcJXih+x3fTf9v/cDFNp7kaOC1wLOq6gkMATe63rj9w9Jup/13wK8y3Ddp7nV4CvDpJaw7+p0G8/c95x0M76wfD/zaAnXf2
7ur8Au0b+HT2+qkZvs73Qv53R9S8cWf9nquqcBfp6D5/9wGB4YHgfcF5VfWFe+6XAbyQ
gaqJ7b2/wP8xBK3fRnwyrmZJBsy/CXOg6rqw8BvA/9kzHqfAZ4K/Kiqvs9wmuvXGH5RzrfUej4F/HKr46cZjn6uZzg1cnySByV5NP23p/2/JA8es62fZPiFdkc7R3/qPu7/VIZTOgv1e217vhZ4BnBXVd0xpu9C9d2bg7jnjwy27eO6SzX6M7ke2JjkKQBJHpzhexjGuQL4V63fuiQ/2drOSHJ4az8kyU+1/g8CzmjTvwT87Zj9a8oMhgeAqtpbVf9xzKI3Mpwf/nyGP4N8Y2v/BLC1Xdh70SKbfxOwIcOF5s8x/JI7CrgyyXUM1wdeN6amuxi+Cet/tKarGP6jzw8vGL5i8d+2i6mPHbN8zrsZLs5+geH0w0vbfj7NcDrlCwxHTteMrLO9jb+7+FxVn2P4pb2LIViX8m7+XODpSa5hOE339wv0u4rhNNKn2umUm7jnF958Y+tbxDnAXya5CvjOPqy3L94P/FH7Ga9j+OV9fvs3cB1D6I/zKuAZ7We0k+G21l9iuBvrZUk+z/DVr0e2/t8Fjkuyk+E01Hnz9+/F5+nz7qqS7rOS3FlVj1jtOh5oPGKQJHU8YpAkdTxikCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUuf/A6flJmuHzJr7AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bar(params, metrics, color='lightblue')\n",
"pyplot.xlabel('Metrics without and with an intercept')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(fields):\n",
" features=np.zeros(total_dt)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" \n",
" for i in type_columns_with_NA:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [],
"source": [
"data_dt=records.map(lambda fields: LabeledPoint(float(fields[saleprice_column]),extract_features_dt(fields)))"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[LabeledPoint(208500.0, [0.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]), LabeledPoint(181500.0, [0.0,0.0,0.0,2.0,1.0,0.0,0.0,10.0,1.0,0.0,0.0,6.0,2.0,3.0,6.0,6.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,3.0,3.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,20.0,9600.0,6.0,8.0,1976.0,1976.0,978.0,0.0,284.0,1262.0,1262.0,0.0,0.0,1262.0,0.0,1.0,2.0,0.0,3.0,1.0,6.0,1.0,2.0,460.0,298.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0]), LabeledPoint(223500.0, [0.0,0.0,1.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,0.0,0.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,11250.0,7.0,5.0,2001.0,2002.0,486.0,0.0,434.0,920.0,920.0,866.0,0.0,1786.0,1.0,0.0,2.0,1.0,3.0,1.0,6.0,1.0,2.0,608.0,0.0,42.0,0.0,0.0,0.0,0.0,0.0,9.0,2008.0]), LabeledPoint(140000.0, [0.0,0.0,1.0,2.0,1.0,3.0,0.0,11.0,0.0,0.0,0.0,5.0,2.0,3.0,7.0,1.0,2.0,2.0,2.0,1.0,3.0,1.0,1.0,0.0,1.0,0.0,3.0,0.0,0.0,3.0,4.0,2.0,3.0,3.0,5.0,4.0,6.0,3.0,3.0,3.0,0.0,0.0,0.0,70.0,9550.0,7.0,5.0,1915.0,1970.0,216.0,0.0,540.0,756.0,961.0,756.0,0.0,1717.0,1.0,0.0,1.0,0.0,3.0,1.0,7.0,1.0,3.0,642.0,0.0,35.0,272.0,0.0,0.0,0.0,0.0,2.0,2006.0]), LabeledPoint(250000.0, [0.0,0.0,1.0,2.0,1.0,0.0,0.0,12.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,4.0,0.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,14260.0,8.0,5.0,2000.0,2000.0,655.0,0.0,490.0,1145.0,1145.0,1053.0,0.0,2198.0,1.0,0.0,2.0,1.0,4.0,1.0,9.0,1.0,3.0,836.0,192.0,84.0,0.0,0.0,0.0,0.0,0.0,12.0,2008.0]), LabeledPoint(143000.0, [0.0,0.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,0.0,0.0,2.0,2.0,3.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,3.0,3.0,3.0,0.0,1.0,1.0,50.0,14115.0,5.0,5.0,1993.0,1995.0,732.0,0.0,64.0,796.0,796.0,566.0,0.0,1362.0,1.0,0.0,1.0,1.0,1.0,1.0,5.0,0.0,2.0,480.0,40.0,30.0,0.0,320.0,0.0,0.0,700.0,10.0,2009.0]), LabeledPoint(307000.0, [0.0,0.0,0.0,2.0,1.0,2.0,0.0,13.0,0.0,0.0,0.0,6.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,3.0,4.0,3.0,4.0,0.0,3.0,5.0,4.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,20.0,10084.0,8.0,5.0,2004.0,2005.0,1369.0,0.0,317.0,1686.0,1694.0,0.0,0.0,1694.0,1.0,0.0,2.0,0.0,3.0,1.0,7.0,1.0,2.0,636.0,255.0,57.0,0.0,0.0,0.0,0.0,0.0,8.0,2007.0]), LabeledPoint(200000.0, [0.0,0.0,1.0,2.0,1.0,3.0,0.0,2.0,2.0,0.0,0.0,5.0,2.0,3.0,8.0,7.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,3.0,2.0,3.0,0.0,3.0,4.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,1.0,60.0,10382.0,7.0,6.0,1973.0,1973.0,859.0,32.0,216.0,1107.0,1107.0,983.0,0.0,2090.0,1.0,0.0,2.0,1.0,3.0,1.0,7.0,2.0,2.0,484.0,235.0,204.0,228.0,0.0,0.0,0.0,350.0,11.0,2009.0]), LabeledPoint(129900.0, [2.0,0.0,0.0,2.0,1.0,2.0,0.0,14.0,3.0,0.0,0.0,0.0,2.0,3.0,9.0,1.0,2.0,2.0,2.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0,3.0,0.0,0.0,3.0,3.0,2.0,4.0,3.0,0.0,3.0,6.0,3.0,0.0,3.0,0.0,0.0,0.0,50.0,6120.0,7.0,5.0,1931.0,1950.0,0.0,0.0,952.0,952.0,1022.0,752.0,0.0,1774.0,0.0,0.0,2.0,0.0,2.0,2.0,8.0,2.0,2.0,468.0,90.0,0.0,205.0,0.0,0.0,0.0,0.0,4.0,2008.0]), LabeledPoint(118000.0, [0.0,0.0,0.0,2.0,1.0,3.0,0.0,15.0,3.0,1.0,3.0,1.0,2.0,3.0,6.0,6.0,2.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,0.0,3.0,5.0,3.0,5.0,2.0,4.0,3.0,0.0,0.0,0.0,190.0,7420.0,5.0,6.0,1939.0,1950.0,851.0,0.0,140.0,991.0,1077.0,0.0,0.0,1077.0,1.0,0.0,1.0,0.0,2.0,2.0,5.0,2.0,1.0,205.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,1.0,2008.0])]\n"
]
}
],
"source": [
"print(data_dt.take(10))"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree feature vector: [0.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]\n",
"Decision Tree feature vector length: 76\n"
]
}
],
"source": [
"first_point_dt = data_dt.first()\n",
"print (\"Decision Tree feature vector: \" + str(first_point_dt.features))\n",
"print (\"Decision Tree feature vector length: \" + str(len(first_point_dt.features)))"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import DecisionTree"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree predictions: [(208500.0, 190334.33561643836), (181500.0, 147907.61375661375), (223500.0, 190334.33561643836), (140000.0, 156058.38888888888), (250000.0, 307760.1111111111)]\n",
"Decision Tree depth: 5\n",
"Decision Tree number of nodes: 63\n"
]
}
],
"source": [
"dt_model = DecisionTree.trainRegressor(data_dt,{})\n",
"preds = dt_model.predict(data_dt.map(lambda p: p.features))\n",
"actual = data.map(lambda p: p.label)\n",
"true_vs_predicted_dt = actual.zip(preds)\n",
"print (\"Decision Tree predictions: \" + str(true_vs_predicted_dt.take(5)))\n",
"print (\"Decision Tree depth: \" + str(dt_model.depth()))\n",
"print (\"Decision Tree number of nodes: \" + str(dt_model.numNodes()))"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared E
or: 875573280.8278\n",
"log - Mean Absolue E
or: 21582.1548\n",
"Root Mean Squared Log E
or: 0.1736\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Impact of training on log-transformed targets"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [],
"source": [
"data_dt_log = data_dt.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n",
"\n",
"dt_model_log = DecisionTree.trainRegressor(data_dt_log,{})\n",
"\n",
"preds_log = dt_model_log.predict(data_dt_log.map(lambda p: p.features))\n",
"\n",
"actual_log = data_dt_log.map(lambda p: p.label)"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [],
"source": [
"new=actual_log.zip(preds_log)"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(12.247694320220994, 12.147159998151047),\n",
" (12.109010932687042, 11.890912291269839),\n",
" (12.31716669303576, 12.147159998151047),\n",
" (11.84939770159144, 11.949554245993713),\n",
" (12.429216196844383, 12.515673640608348)]"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.take(5)"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_dt_log=[]\n",
"for val in new.collect():\n",
" t,p=val[0],val[1]\n",
" x=np.exp(t),np.exp(p)\n",
" true_vs_predicted_dt_log.append(x)"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared E
or: 1022580494.4448\n",
"log - Mean Absolue E
or: 21569.5794\n",
"Root Mean Squared Log E
or: 0.1610\n",
"Non log-transformed predictions:\n",
"[(208500.0, 190334.33561643836), (181500.0, 147907.61375661375), (223500.0, 190334.33561643836)]\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt_log:\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)\n",
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted_dt.take(3)))\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CROSS VALIDATION for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [],
"source": [
"train_dt, test_dt = data_dt.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(train, test, maxDepth, maxBins):\n",
"\n",
" model = DecisionTree.trainRegressor(train, {}, impurity='variance', maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(test.map(lambda p: p.features))\n",
"\n",
" actual = test.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tree depth"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 10, 20]\n",
"[0.332943090421251, 0.2770328548990305, 0.25563569006835973, 0.24091676957589137, 0.212163652773227, 0.22209830650755852, 0.23462193469250922]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEKCAYAAAD+XoUoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8HPV9
HXR5clWbctS7IkfIBPsIAgDIEACQXCYSBHk0BDkzRpIW1o0+aX/EJ+pPmlpLkgTX9pSxtImrNJCEeTGmMCCSGBJJhggi1jjE8OS76EbfmSbFnS5/fHjMRqvbLWtqRZ7byfj4ce2p35zu5nR6v3zn5n5jvm7oiISDzkRF2AiIiMHYW+iEiMKPRFRGJEoS8iEiMKfRGRGFHoi4jEiEJfRCRGFPoiIjGi0BcRiZG8qAtINnnyZJ8+fXrUZYiIjCvPPvvsa+5ePVy7jAv96dOns3z58qjLEBEZV8zslXTaqXtHRCRGFPoiIjGi0BcRiRGFvohIjCj0RURiRKEvIhIjCn0RkRjJmtDf03mY
1iPS2tHVGXIiKSsTLu5KzjlZMD
yLdeTlGk0NFVGXIyKSkbJmS7+0MJ/6iiLWbtsXdSkiIhkra0IfYG5tqUJfROQosir0Z9eWsrF9P909fVGXIiKSkbIq9OfWltLT52x6bX/UpYiIZKSsCv05taUA6uIRERlCVoX+zMkl5OWYQl9EZAhZFfoFeTmcXF2i0BcRGUJWhT4EO3NfVOiLiKSUdaE/t7aUto4u9h08HHUpIiIZJ+tCf05NsDN33XZt7YuIJMu+0B84gkeHbYqIJMu60G+oLKJkQh5rt+2NuhQRkYyTdaFvZsyuKdHOXBGRFLIu9CHo4lm7fR/uHnUpIiIZJTtDv6aUjs7D7Nh3KOpSREQySnaGfm0ZoOEYRESSZWXoz9UYPCIiKaUV+mZ2uZmtNbMNZnZLivkfNrNVZ
CzH5jZvPD6Zea2bPhvGfN7OKRfgGpVE4sYErpBO3MFRFJMmzom1kucCdwBTAfuL4/1BP80N0XuPsZwO3AV8PprwFXu/sC4P3A90es8mEEO3N12KaISKJ0tvQXAhvcfZO7dwP3ANcmNnD3xHSdCHg4/Tl33xJOXw0UmtmEEy97eHNqSlm/fT+9fTqCR0SkXzqhXw9sTrjfGk4bxMw+YmYbC
0/ybF47wTeM7djzikxsxuNLPlZra8vb09vcqHMae2lEM9fby888CIPJ6ISDZIJ/QtxbQjNp/d/U53Pxn4JPDpQQ9gdirwZeCmVE/g7ne7e7O7N1dXV6dR0vDmhkfwrFO/vojIgHRCvxVoTLjfAGwZoi0E3T9v679jZg3AT4D3ufvG4ynyeMyqKSHH0M5cEZEE6YT+M8AsM5thZgXAdcDixAZmNivh7lXA+nB6BfAQ8Cl3/+3IlJyewvxcpk+aqMM2RUQSDBv67t4D3Aw8AqwB7nX31WZ2m5ldEza72cxWm9kK4GMER+oQLncK8Pfh4ZwrzGzKyL+M1GbXBMMxiIhIIC+dRu6+FFiaNO0zCbc/OsRy/wj844kUeCLm1JbyyAvb6OrupaggN6oyREQyRlaekdtvbm0p7rBhh8bWFxGBLA/9/guqvKix9UVEgCwP/WmTJjIhL0c7c0VEQlkd+rk5xqyaEu3MFREJZXXoA8ypKdOx+iIioawP
m1pbTvO8SuA91RlyIiErmsD/05GltfRGRA1of+6xdU0RE8IiJZH
VpROoKM7XzlwREWIQ+mbGnJpS7cwVESEGoQ9BF8+6bfvo0wVVRCTmYhH6c2rLONDdS1tHV9SliIhEKiahryN4REQgJqE/u6YEQDtzRST2YhH6pYX51FcUaWeuiMReLEIfgp25OlZfROIuNqE/p7aUTe0H6O7pi7oUEZHIxCr0e/qcTa/pgioiEl+xCf25tWWAjuARkXiLTejPmDyRvBzTzlwRibXYhH5BXg4nV5doS19EYi02oQ9Bv75CX0TiLHah39bRxb6Dh6MuRUQkEmmFvpldbmZrzWyDmd2SYv6HzWyVma0ws9+Y2fyEeZ8Kl1trZm8dyeKPVf/Y+ut0Zq6IxNSwoW9mucCdwBXAfOD6xFAP/dDdF7j7GcDtwFfDZecD1wGnApcD/x4+XiT6x+DRzlwRiat0tvQXAhvcfZO7dwP3ANcmNnD3xFNdJwL9YxhfC9zj7ofc/SVgQ/h4kaivKKJkQp769UUktvLSaFMPbE643wqck9zIzD4CfAwoAC5OWHZZ0rL1x1XpCDAzZteUaEtfRGIrnS19SzHtiKuRuPud7n4y8Eng08eyrJndaGbLzWx5e3t7GiUdvzm1Zazdtg93XVBFROInndBvBRoT7jcAW47S/h7gbceyrLvf7e7N7t5cXV2dRknHb25tKXu6DrNj36FRfR4RkUyUTug/A8wysxlmVkCwY3ZxYgMzm5Vw9ypgfXh7MXCdmU0wsxnALOD3J1728dPOXBGJs2H79N29x8xuBh4BcoFvuftqM7sNWO7ui4GbzewS4DCwG3h/uOxqM7sXeAHoAT7i7r2j9FrSMqem/ypae7lo9uh+qxARyTTp7MjF3ZcCS5OmfSbh9kePsuzngc8fb4EjrXJiAVNKJ2hLX0RiKVZn5PbTcAwiElexDP25taWs37Gf3j4dwSMi8RLL0J9TW0Z3Tx8v7zwQdSkiImMqnqE/sDNXXTwiEi+xDP1ZNSXkmA7bFJH4iWXoF+bnMn3SRNZu2zt8YxGRLBLL0AcdwSMi8RTr0H9lVydd3ZGeKyYiMqZiG/pza0txh/U7tLUvIvER29CfXaMxeEQkfmIb+tMmTaQwP0f9+iISK7EN/dwcY9YU7cwVkXiJbehDeASPLpIuIjES69CfW1tK+75D7DrQHXUpIiJjItah
OXJ2kJSLxEOvQn1urMXhEJF5iHfrVpROoLM5X6ItIbMQ69M2MObWlOlZfRGIj1qEPMLe2jPXb99GnC6qISAzEPvTn1JZyoLuXto6uqEsRERl1sQ99DccgInES+9CfM3AEjw7bFJHsF/vQL5mQR0Nlk
0RSQWYh/6EByvv07DMYhIDKQV+mZ2uZmtNbMNZnZLivkfM7MXzKzFzB4zs2kJ8243s9VmtsbM/sXMbCRfwEiYU1vKpvYDdPf0RV2KiMioGjb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we7jsecD5QBNwGnA2cNGIVT9CZteU0tPnbGzfH3UpIiKjKp0t/YXABnff5O7dwD3AtYkN3P1xd+8M7y4DGvpnAYVAATAByAe2j0ThI2lubRmg4RhEJPulE
1wOaE+63htKF8CHgYwN2fAh4HtoY/j7j7muMrdfTMrJ5Ifq5pZ66IZL10Qj9VH3zK01fN7AagGbgjvH8KMI9gy78euNjMLkyx3I1mttzMlre3t6db+4jJz83h5OoSHbYpIlkvndBvBRoT7jcAW5IbmdklwK3ANe5+KJz8dmCZu+939/0E3wDOTV7W3e9292Z3b66urj7W1zAi5tSWsm67+vRFJLulE
PALPMbIaZFQDXAYsTG5jZmcBdBIG/I2HWq8BFZpZnZvkEO3EzrnsHgp25bR1d7D14OOpSRERGzbCh7+49wM3AIwSBfa+7rzaz28zsmrDZHUAJcJ+ZrTCz/g+F+4GNwCpgJbDS3R8c6RcxEvrH1l+nfn0RyWJ56TRy96XA0qRpn0m4fckQy/UCN51IgWOlfziGla17aJ5eFXE1IiKjQ2fkhuorilhQX84Plr2iYZZFJGsp9ENmxl9cOJNNrx3g52sy7lQCEZERodBPcOVptTRUFnH3E5uiLkVEZFQo9BPk5ebw52+awbOv7Gb5y7uiLkdEZMQp9JO8++xGKorzuUtb+yKShRT6SYoL8vjTc6fxizXbNQCbiGQdhX4K73vjdPJzc/jmk9raF5HsotBPobp0Au98QwMP/KGN9n2Hhl9ARGScUOgP4S8umMHh3j6++7uXoy5FRGTEKPSHMLO6hEvn1fD9Za9w4FBP1OWIiIwIhf5R3HTRTPZ0Hebe5ZuHbywiMg4o9I/irGlVnDWtkv/8zUv09Or6uSIy/in0h3HThTNp3d3F0ue3RV2KiMgJU+gP45J5NcysnsjdT2zEXQOxicj4ptAfRk6O8RcXzOT5tr38buPOqMsRETkhCv00vP3MeiaXTNDQDCIy7in001CYn8sHzpvGE+vaWbNVF08XkfFLoZ+mG86dRnFBLt/Q1r6IjGMK/TRVFBfw7uZGFq/cwpaOrqjLERE5Lgr9Y/ChN83AgW
9qWoSxEROS4K/WPQWFXMlQvq+NHvN7P34OGoyxEROWYK/WN004Uz2X+ohx8+/WrUpYiIHDOF/jE6
6c80+ZxLd/+xLdPRqaQUTGF4X+cbjxwpPZvvcQ/7OiLepSRESOSVqhb2aXm9laM9tgZrekmP8xM3vBzFrM7DEzm5Yw7yQze9TM1oRtpo9c+dG4cNZk5taW8o0nN9HXp6EZRGT8GDb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we8K87wF3uPs8YCGwYyQKj5KZceOFM1m3fT+/WjfuX46IxEg6W/oLgQ3uvsndu4F7gGsTG7j74+7eGd5dBjQAhB8Oee7+87Dd/oR249rVp0+l
yQu36tk7VEZPxIJ/TrgcSriLSG04byIeDh8PZsoMPM/tvMnjOzO8JvDuNefm4OHzx/Bk+/tIuVmzuiLkdEJC3phL6lmJayI9vMbgCagTvCSXnABcDHgbOBmcAHUix3o5ktN7Pl7e3taZSUGa5b2EjphDzu1tAMIjJOpBP6rUBjwv0GYEtyIzO7BLgVuMbdDyUs+1zYNdQD/BR4Q/Ky7n63uze7e3N1dfWxvobIlBbm8yfnnsTDz2/l1Z1Z0WslIlkundB/BphlZjPMrAC4Dlic2MDMzgTuIgj8HUnLVppZf5JfDLxw4mVnjg+eP4PcHOObv9HWvohkvmFDP9xCvxl4BFgD3Ovuq83sNjO7Jmx2B1AC3GdmK8xscbhsL0HXzmNmtoqgq+gbo/A6IlNTVsjbzqjn3uWb2XWgO+pyRESOyjLtEoDNzc2+fPnyqMs4Juu37+PSf36Cv7tkNh+9ZFbU5YhIDJnZs+7ePFw7nZE7AmbVlHLx3Cl876mXOXi4N+pyRESGpNAfITdeOJOdB7r5r2WvRF2KiMiQFPoj5JwZVbxlTjX/9Og6Xtl5IOpyRERSUuiPEDPjC+9YQF6u8Yn7WzQmj4hkJIX+CKorL+LvF83n9y/t4vvq5hGRDKTQH2HvOquBi2ZX86WHX9QJWyKScRT6I8zM+OI7FpCXY3zi/pXq5hGRjKLQHwVTK4r49KJ5PP3SLv7raXXziEjmUOiPknc3N3KhunlEJMMo9EeJmfGldywgx4z
YC6eUQkMyj0R9HUiiI+fdU8lm3axQ/UzSMiGUChP8rec3YjF8yazBcffpHNu9TNIyLRUuiPMjPjS+9sCrp5dNKWiERMoT8G6iuKuPWqeTy1aSc
P2rUZcjIjGm0B8j153dyJtOmcwXl65RN4+IREahP0aCbp4FmBmffKCFTLuOgYjEg0J/DDVUFvN
pzH7zaqm0dEoqHQH2PXLwy6eb7w0Bpad6ubR0TGlkJ/jPWPzQNwywOr1M0jImNKoR+BxqpiPnXlPH6z4TV+9PvNUZcjIjGi0I/Inyw8ifNOnsTnH3pB3TwiMmYU+hHJyTG+/M4mHPjUf6ubR0TGhkI/Qv3dPE+uf417nlE3j4iMPoV+xN678CTeOHMSn39oDW0dXVGXIyJZLq3QN7PLzWytmW0ws1tSzP+Ymb1gZi1m9piZTUuaX2ZmbWb2byNVeLbIyTFu/+Mm+ty5RSdticgoGzb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we9L8zwG/PvFys1NjVTGfumIuT65/jcUrt0RdjohksXS29BcCG9x9k7t3A/cA1yY2cPfH3b3/EJRlQEP/PDM7C6gBHh2ZkrPTe8+ZxozJE/mx+vZFZBSlE
1QGIStYbThvIh4GEAM8sB/gn4xNGewMxuNLPlZra8vb09jZKyT06OsaipjmWbdtK+71DU5YhIlkon9C3FtJQdz2Z2A9AM3BFO+itgqbsfdfPV3e9292Z3b66urk6jpOy0qGkqfQ4PP7816lJEJEulE/qtQGPC/QbgiI5nM7sEuBW4xt37N1XfCNxsZi8DXwHeZ2ZfOqGKs9ic2lJmTSlhyUqFvoiMjnRC/xlglpnNMLMC4DpgcWIDMzsTuIsg8Hf0T3f397r7Se4+Hfg48D13P+LoH3nd1adP5ZlXdrF1jw7fFJGRN2zou3sPcDPwCLAGuNfdV5vZbWZ2TdjsDqAEuM/MVpjZ4iEeToaxqKkOd3ioRVv7IjLyLNOOC29ubvbly5dHXUakrvzakxTk5fDTj5wfdSkiMk6Y2bPu3jxcO52Rm4EWnV7His0duqyiiIw4hX4GWrRgKgAPrVIXj4iMrLyoC5AjnTSpmNMbylnSsoUPX3Ry1OWIyCg41NNL2+4uNu/uonV3J5t3dVFelM9fvnl0/+cV+hlqUdNUPr90DS+/doDpkydGXY6IHKOe3j627jnI5t2dtO7uonVXJ5t3d7F5V3B/+76DJO5Szc81zj9lskI
q5qquPzS9ewpGULN188K+pyRCRJX5/Tvv/QQIhv3tXJ5nCLvbWjky0dB+ntez3VcwzqyotoqCzi/FMm01hVRGNlMY1VxTRUFlFTVkhuTqpzYUeWQj9DTa0oonlaJUtatir0RSLg7uzuPPx6qO/uDIM96I5p3d1Fd0/foGWqSyfQWFnEmY2VXHP64FCvKy+iIC/63agK/Qy2qKmOzz74Auu372NWTWnU5YhknX0HDydspfd3vby+5X6gu3dQ+4rifBori5lTU8ol82porCyioao4+F1ZTGF+bkSvJH0K/Qx25YI6/mHJCzzYspWPXarQFzlWBw/3DmyltyZspW/eFUzr6Dw8qP3EgtyBLfNzZ06iMSHQG6uKKC3Mj+iVjByFfgabUlbIOTOqWNKyhb+7ZBZmo9/fJzKeHO7tY2vHwYSul85BW+7JI9YW5OXQEIZ4U0N5GOpByDdWFVNZnJ/1/2cK/Qy3qGkqn/7p86zZuo/5U8uiLkdkTPX2OTv2HQy2zJNCvXV3F1v3dJGwr5TcHKOuvJDGymLeMqc6CPSEHabVJRPIGYOdpZlMoZ/hrjitlv+7eDVLWrYo9CXruDs7D3QP2kG6eVf/707aOro43Dt4qJiasgk0VhazcEbVQNdLf7DXlReSlxv9ztJMptDPcJNKJnDeyZNY0rKVT7x1TtZ/9ZTss6fr8BE7SBOPV+86PHhnadXEAhorizi1vpzLT6sb6HpprCxiakXRuNhZmskU+uPA1U1T+d8PtLCqbQ9NDRVRlyMySGd3z6Aul0HHq+/uZO/BnkHtSyfk0VBVzIzJE7lwdnUQ6gmHNk6coFgaTVq748BbT63l1p+u4sGVWxT6Mua6e/po6xh81Et/wLfu7uS1/d2D2hfm5wRHu1QWcda0yiNOQiovyv6dpZlMoT8OlBfnc8Gsah5q2cqnrpgX+x1RMrJ6+5yte7oGdb20JpyQtG3v4OEC8nKM+nD
JJ5NQNh3v+7umSCQj2DKfTHiUVNdfzyxR08t3k3Z02rirocGUfcnfZ9hxJ2lL4+VMDmXV1s6eiiJ+EQGDOoKyukoaqYN548aWArvf9EpNoxGi5ARodCf5y4dH4NBXk5PLhyq0JfBnH3cGfp4K6XxNuHkoYLmFwygcaqIk5vrGBRU92g49WnVmTGcAEyOhT640RpYT5vmVPN0lVb+ftF87WlFTMHDvUM7BxNFer7Dw3eWVpelE9DZRGzppRy8dwpr3fBVBbTUFlMUYGOgIkrhf44sqhpKo+s3s4zL+/i3JmToi5HRtDBw720dXQNHtArYct9d9JwAUX5uQM7SM+dOWngLNPGqqBvvSwLhguQ0aHQH0f+aN4UivJzWdKyRaE/zgyMrZ5iqIDNuzrZkTxcQG4O9ZXBMLynLagbNFRAY2URVRMLtLNUjotCfxwpLsjj4nlTeHjVNj579ak68zCD9PU5O/YdCgO984hhA7buST22emNVERfNrh50BExjZTFTSjVcgIwOhf44c3VTHQ+1bOWpTTu5YFZ11OXEhruz60D3EaM0bt7VSdvuLlo7jhxbfUrpBBqrimmeVvl610t4JExteSH5+tCWCCj0x5k3z5nCxIJclqzcqtAfYQcO9fDKzs7BR8Ak7DTtTBpbvbI4n8aqYubVlXHp/JqBcdUbq4qp13ABkqHSCn0zuxz4GpALfNPdv5Q0/2PAnwM9QDvwQXd/xczOAP4DKAN6gc+7+49HsP7YKczP5dL5Nfxs9TY+97bTdGjdcerq7uWFrXtoad3DqtY9tLTtYWP7/kEnIZVMyKOhsoiTJhVz3ikJx6tXBTtNSzRcgIxDw75rzSwXuBO4FGgFnjGzxe7+QkKz54Bmd+80s78EbgfeA3QC73P39WY2FXjWzB5x944RfyUxcvXpU/npii38dsNrvGXulKjLyXgHD/eyZutenm8LQ75tD+u27xsYkrembAIL6iu4umkqs2pKBnaaVsRgbHWJn3Q2VRYCG9x9E4CZ3QNcCwyEvrs/ntB+GXBDOH1dQpstZrYDqAYU+ifgglnVlBXm8WDLFoV+ku6ePtZu20dLW0ewBd8aBHz/GaeTJhbQ1FDOZafW0lRfzoKGcmrKCiOuWmTspBP69cDmhPutwDlHaf8h4OHkiWa2ECgANh5LgXKkgrwc3npqLT97fhsHD/fGtu/4cG8f67fvZ1Vbx8AW/Itb99HdG+xQrSjOZ0F9OTfNncmC+gqaGsqpKy/U1rvEWjqhn+o/xFNMw8xuAJqBi5Km1wHfB97v7n0plrsRuBHgpJNOSqMkWXT6VO57tpUn1rVz2am1UZcz6nr7nI3t+8M++A5a2vbwwpa9A8MLlBbmsaC+nD9703SawoBvqCxSwIskSSf0W4HGhPsNwJbkRmZ2CXArcJG7H0qYXgY8BHza3ZelegJ3vxu4G6C5uTnlB4oMdt7Jk6gszufBlq1ZF/p9fc5LOw8MdM+sauvg+ba9AxfbmFiQy6n15fzpudNY0FBOU0MF06qKdVy7SBrSCf1ngFlmNgNoA64D/iSxgZmdCdwFXO7uOxKmFwA/Ab7n7veNWNVCfm4Ol59Wx/+saKOru3fcjqXi7ry6q3Oge6alNQj4
FkCvNzOHVqOe85u5GmhnKaGsqZMblEYw+JHKdhQ9/de8zsZuARgkM2v+Xuq83sNmC5uy8G7gBKgPvCr9Ovuvs1wLuBC4FJZvaB8CE/4O4rRv6lxM/Vp9fxo9+/yi9f3MFVTXVRlzMsd6d1d1cY7nvCo2k6Bq6sVJCXw/y6Mt5+Zn24BV/OKdUlOvNYZASZe2b1pjQ3N/vy5cujLmNc6O1zzv3iYzRPq+Q
jgr6nIGcXe27T046Dj4Va0dAwOH5ecac2vLgnAPj6KZXVOqs1RFjpOZPevuzcO109kl41hujnHlabXc88xm9h/qifRkoR37Dib0wQe/X9t/aKDO2TWlXDa/dmALfk5tKRPyxmeXlMh4ptAf5xadPpXvPvUKj63ZzrVn1I/Jc+7cf4hVbYlb8HvYtvcgEAwkdsqUEi6aXU1TQ7AFP7+uLLaHlYpkGoX+OHfWSZXUlhXy4MqtoxL6ezoPB1vuCSc7tXV0DcyfWT2Rc2dWsaAhOExyfl0ZEzU8gUjG0n/nOJeTY1zVVMf3nnqZPV2HKS86/otn7Dt4mOfb9g462emVnZ0D86dNKubMkyp4/3nTWFBfwWn1ZZTqYh0i44pCPwssaqrjP3/zEo+u3sa7mhuHX4BgRMnVW
S0tox0FWz6bUDA/MbKotoaijnurNPYkF9OQvqyykvVsCLjHcK/SxwRmMFDZVFLGnZmjL0gxEl9w6cybqqdQ8bEkaUrCsvZEF9Oe94Qz0LGipYUF9O1cSCMX4VIjIWFPpZwMxY1DSVbz65ie17D7Jtz8GBQyRbWvewfsf+gas2TS6ZwOkN5VzVVEdTQzmn1ZczpVQDjonEhUI/SyxqquPrv97IOV94bGBa1cQCFtSXc+n8GhbUB8MV1JRN0Hg0IjGm0M8Sp04t468vPoWePh842am+QgOOichgCv0sYWb8r8vmRF2GiGQ4nfMuIhIjCn0RkRhR6IuIxIhCX0QkRhT6IiIxotAXEYkRhb6ISIwo9EVEYiTjLpdoZu3AK1HXcRSTgdeiLuIoVN+JUX0nRvWdmBOpb5q7Vw/XKONCP9OZ2fJ0rkMZFdV3YlTfiVF9J2Ys6lP3johIjCj0RURiRKF/7O6OuoBhqL4To/pOjOo7MaNen
0RURiRFv6IiIxotBPYmaNZva4ma0xs9Vm9tEUbd5sZnvMbEX485kI6nzZzFaFz788xXwzs38xsw1m1mJmbxjD2uYkrJsVZ
XzP42qc2YrkMz+5aZ7TCz5xOmVZnZz81sffi7cohl3x+2WW9m7x/D+u4wsxfDv99PzKxiiGWP+l4Yxfo+a2ZtCX/DK4dY9nIzWxu+F28Zw/p+nFDby2a2Yohlx2L9pcyVSN6D7q6fhB+gDnhDeLsUWAfMT2rzZmBJxHW+DEw+yvwrgYcBA84Fno6ozlxgG8ExxJGtQ+BC4A3A8wnTbgduCW/fAnw5xXJVwKbwd2V4u3KM6rsMyAtvfzlVfem8F0axvs8CH0/j778RmAkUACuT/59Gq76k+f8EfCbC9ZcyV6J4D2pLP4m7b3X3P4S39wFrgPpoqzou1wLf88AyoMLM6iKo44+Aje4e6Ql37v4EsCtp8rXAd8Pb3wXelmLRtwI/d/dd7r4b+Dlw+VjU5+6PuntPeHcZ0DDSz5uuIdZfOhYCG9x9k7t3A/cQrPcRdbT6LLhm6LuBH43086
KLky5u9Bhf5RmNl04Ezg6RSz32hmK83sYTM7dUwLCzjwqJk9a2Y3pphfD2xOuN9KNB9e1zH0P1vU67DG3bdC8E8JTEnRJlPW4wcJvrmlMtx7YTTdHHY/fWuIrolMWH8XANvdff0Q88fKjnFPAAAIIklEQVR0/SXlypi/BxX6QzCzEuAB4G/dfW/S7D8QdFecDvwr8NOxrg84393fAFwBfMTMLkyan+qK6GN6qJaZFQDXAPelmJ0J6zAdmbAebwV6gB8M0WS498Jo+Q/gZOAMYCtBF0qyyNcfcD1H38ofs/U3TK4MuViKace9DhX6KZhZPsEf5gfu/t/J8919r7vvD28vBfLNbPJY1ujuW8LfO4CfEHyNTtQKNCbcbwC2jE11A64A/uDu25NnZMI6BLb3d3mFv3ekaBPpegx32i0C3uthB2+yNN4Lo8Ldt7t7r7v3Ad8Y4nmjXn95wDuAHw/VZqzW3xC5MubvQYV+krD/7z+BNe7+1SHa1IbtMLOFBOtx5xjWONHMSvtvE+zwez6p2WLgfeFRPOcCe/q/Ro6hIbewol6HocVA/5EQ7wf+J0WbR4DLzKwy7L64LJw26szscuCTwDXu3jlEm3TeC6NVX+I+orcP8bzPALPMbEb4ze86gvU+Vi4BXnT31lQzx2r9HSVXxv49OJp7rMfjD/Amgq9OLcCK8OdK4MPAh8M2NwOrCY5EWAacN8Y1zgyfe2VYx63h9MQaDbiT4MiJVUDzGNdYTBDi5QnTIluHBB8+W4HDBFtOHwImAY8B68PfVWHbZuCbCct+ENgQ/vzZGNa3gaAvt/99+PWw7VRg6dHeC2NU3/fD91YLQXjVJdcX3r+S4GiVjWNZXzj9O/3vuYS2Uay/oXJlzN+DOiNXRCRG1L0jIhIjCn0RkRhR6IuIxIhCX0QkRhT6IiIxotCPITNzM/t+wv08M2s3syXDLHfGUCMphvObzexfTrC2ajN72syeM7MLTuSxwseb3j/yYmJ9ZjbBzH4Rjqz4HjO7IBz9cIWZFZ3o8x6lnjeb2XnHOm8U6vismX38OJcd9D44kceSsZcXdQESiQPAaWZW5O5dwKVAWxrLnUFw/PDS5Blmlufuy4ETHZr2jwhOpkl7+Fgzy3X33uHaJdV3JpDv7meEj/F14Cvu/u00n9MILkLUl26doTcD+4HfHcu8cP32HLFENIZ8H8g4MBonIugns38IguULwB+H979HcObnkvD+ROBbBGdTPkcwEmAB8CrQTnBiyXsIhta9G3gU+CEJwyUDJcC3ef3knXcSDLP7HYIzHlcBf5dU1xlJz1FEcFbvqnCZLye9htsIBq16U9LjnEVwss1TwB2Ew+3210cwqNUGYE/4PDcRjND4EsEp8gCfCF9/C/AP4bTpBKMj/nu4XqYRnB35FMFYQvcBJWHbl4F/CKevAuaGy28j+IBdAVyQUPMR88J19VXgcYJxbY74u4TL5oavs7/em4b4u98KrAV+QXAy08fD6ScDPwOeBZ4E5obTvwN8PZy2jmA4iKHeB98CfkUw7O/fJLyPHg
Fs8D74n6va8fV+jH8ScMzCbgfqAw/Od9M68H9heAG8LbFeE
ETgA8C/JTzOZ8OgKArvJz7Gl4H/l9C2kiCMf54wrSJFbQPPQXDm5KtANcG30l8CbwvnOfDuIV5fC3BRePuI0E++Hd7/Dq9/CF5G8GFmBF2gSwjGa58O9AHnhu0mA08AE8P7nyQcs50g9P86vP1XhGdXcpQx6JPnhTUtAXKH+bvcCHw6nD6B4NvMjKTHPovgw6cYKCP40OsP/ceAWeHtc4BfJjz/z8J1MIvgTNfCId4HvwufezLBmdj5BB/030hoV57qdetnbH/UvRNT7t4SDvF6PUd+Tb8MuCahn7YQOGmIh1rsQRdRsksIxlnpf77dZrYJmGlm/0qwBfjoMGWeDfzK3dsBzOwHBOH7U6CXYPCqQcysnODD5NfhpO8TDPx2LC4Lf54L75cQhN6rwCseXJ8AgovTzAd+Gw4jVECw1d+vf1CtZwkG/Toe9/nrXVdD/V0uA5rM7I/D6eVhvS8lPM4FwE88HMPHzBaHv0uA84D7wtcAQXj3u9eDLqz14d9v7hB1PuTuh4BDZrYDqCH4kPmKmX2Z4AP2yWN/+TLSFPrxthj4CsFW76SE6Qa8093XJjY2s3NSPMaBIR7bSBr+NQz+0wkuCvERggtbfPAo9aUaU
fQU/dj3/E8x4HA77o7ncNmhh8SB5Iavdzd79+iMc5FP7u5fj/15KfL9XfxQi+VQw3CFeq9ZIDdHi4byONZYZat4cSbvcSXPFrnZmdRTDGzBfN7FF3v22YGmWU6eidePsWcJu7r0qa/gjw1wmjYJ4ZTt9HcKm3dDxKMKga4WNUhkMn57j7A8DfE1ze7mieBi4ys8lmlkvwreTXR1vA3TuAPWb2pnDSe9OsN9EjwAfDrWDMrN7MUl3cYhlwvpmdErYrNrPZwzz20dbhcOt3qL/LI8BfhkP3YmazwxEjEz0BvN3MisJRJa+GYIhr4CUze1e4rIUfzP3eZWY5ZnYyweBka9Ook/CxpgKd7v5fBBsXY3adZhmaQj/G3L3V3b+WYtbnCPpkW8LDHT8XTn8cmN9/mOMwD/+PQKWZPW9mK4G3EFzt51cWXKD6O8Cnhqlva9jmcYKdgX9w91RDzyb7M+BOM3sKSNX1dFTu3r9j+ikzW0Ww7+OIkAu7nT4A/MjMWgg+BIbq/uj3IEH4rkhxSOrR5sHQf5dvAi8Afwin30XSNwsPLtX3Y4L9Nw8Q7Jzt917gQ+HfaTWDL2e4luCD9mGC0SoPkv77YAHw+/DvfSvBe0IiplE2RSQlM/sOQV/8/VHXIiNHW/oiIjGiLX0RkRjRlr6ISIwo9EVEYkShLyISIwp9EZEYUeiLiMSIQl9EJEb+P+ZLBP6j/4ukAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [1, 2, 3, 4, 5, 10, 20]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, param, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different tree depths')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Maximum bins"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.22578199542260993, 0.22626606160811255, 0.20380255431723798, 0.2076920210675261, 0.212163652773227, 0.21000218813883056, 0.2228581552832826]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEKCAYAAAASByJ7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VOX5
XlY0EwoRd2UFBEBBZwqLWuqHV2qq1VkVQVFxqq7baWm1
av126r9VWu1VkHcQHD/iPtura0g+yoKIrJLkH3Jfv3+OCdhEhMyIcskmffz8ciDmTNnzlwnE+Y9933OuW9zd0RERJLiXYCIiDQMCgQREQEUCCIiElIgiIgIoEAQEZGQAkFERAAFgoiIhBQIIiICKBBERCSUEu8CqqNdu3beo0ePeJchItKozJkzZ7O7t69qvUYVCD169GD27NnxLkNEpFExsy9jWU9dRiIiAigQREQkpEAQERFAgSAiIiEFgoiIAAoEEREJKRBERARI8ED47KudvLX0q3iXISLSIDSqC9NqS1Gx89AHK/nbW59SVOwsu+000lISOhtFRBIvEFZt3s0vnlnAnC+30jErnQ3bc9mRW0C7zGbxLk1EJK4SJhDcnckzV/P/XvmE1GTjnvMG4TjXPbWAHXsVCCIiCREIG7fn8qvnFvLBZzkc27sdd54zkI5ZGby7LDh+sCO3MM4ViojEX0IEwk+fnMvS9Tu47awBjB3RDTMDIJKeCsCOvQXxLE9EpEFIiED401kDSE9Npme7FmWWRzLCQMhVIIiIJEQgHN4xUuHyfS0EdRmJiCT0uZaRjCAP1UIQEYkxEMzsVDP71MxWmNlNFTx+vZktNbOFZvaOmXUPlw8ys4/MbEn42HlRz3nUzL4ws/nhz6Da263YZKQmk5JkOoYgIkIMgWBmycD9wGlAP2C0mfUrt9o8INvdBwLPAneGy/cAF7l7f+BU4B4zaxX1vBvcfVD4M7+G+1JtZkYkI1UtBBERYmshDAdWuPtKd88HpgFnRq/g7u+5+57w7gygS7j8M3dfHt5eD2wCqpzXsz5F0lN0DEFEhNgCoTOwJur+2nBZZcYDr5VfaGbDgTTg86jFt4ddSXebWYVXhpnZFWY228xm5+TkxFBu9WSphSAiAsQWCFbBMq9wRbOxQDZwV7nlHYEngEvcvThc/GugLzAMaAPcWNE23f0hd8929+z27Wu/cRHJSGW7jiGIiMQUCGuBrlH3uwDry69kZqOA3wJnuHte1PII8Apws7vPKFnu7hs8kAc8QtA1Ve8i6ak6qCwiQmyBMAvobWY9zSwNOB+YHr2CmQ0GHiQIg01Ry9OAF4DH3f2Zcs/pGP5rwFnA4prsyIGKZKRo6AoREWK4MM3dC83sauANIBmY5O5LzOxWYLa7TyfoIsoEngmHhVjt7mcA5wLfBtqa2cXhJi8OzyiaYmbtCbqk5gM
t1di41aCCIigZiuVHb3V4FXyy27Jer2qEqeNxmYXMljJ8ZeZt2JZKSSV1hMbkER6anJ8S5HRCRuEvpKZQhOOwXYqW4jEUlwCgQNcCciAigQNAS2iEhIgVA6wJ26jEQksSkQ1EIQEQEUCGTpGIKICKBAKD2orOErRCTRJXwgNEtJIi05SSOeikjCS/hACOZESFGXkYgkvIQPBNDwFSIioEAAoGVGqk47FZGEp0CgZNY0tRBEJLEpEEDzKotIg5WzM4/fv7iY3IKiOn8tBQIlxxDUZSQiDcvmXXlcMGEGT89ey4pNu+r89RQIoLOMRKTB+XpXHmMmzGTN1j1MungYAzpn1flrKhAIWgj54ZwIIiLxtmV3PmMmzmTV17t5eNwwjjq0
28rgIBDV8hIg3Htj35jJ04k5WbdzNxXDbH9GpXb6+tQCBqTgQdRxCRONq+p4AxE2eyImcXEy7K5tje7ev19RUI7Js1TeMZiUi8bN9bwNiHZ7L8q108eOFQjjusfsMAFAiAZk0TkfjakVvARQ/PZNnGHTwwdggn9OkQlzoUCGhOBBGJn525BYyb9DFLN+zgn2OGctLhB8WtFgUCmjVNROJjV14hFz8yi0Vrt3PfBUM4uV/8wgAgJa6v3kCohSAi9W13XiGXPPIx89ds477Rg/lO/4PjXZJaCADpqcmkpSTpGIKI1Is9+YVc8ugs5q7exr3nD+a0IzrGuyRAgVBKw1eISH3Ym1/EpY/OYvaqLdxz3iBOH9gwwgAUCKU0fIWI1LW9+UWMf2wWH3+xhbvPG8T3j+wU75LK0DGEkCbJEZG6lFtQxOWPz+ajlV/zt3OP5MxBneNd0jeohRCKaJIcEakjJWHw3883c9c5R/KDwV3iXVKFFAihrIxUdqqFICK1LK+wiB9PnsN/lm/mjrMHcs7QhhkGoEAoFUlP0dAVIlKr8gqLuGryXN7/NIe/nH0E5w7rGu+S9kuBECqZNc3d412KiDQB+YXF/HTKXN5dtonbfzCA84d3i3dJVVIghCLpqRQUObkFxfEuRUQauYKiYq5+ci5vf7KJ284awJgR3eNdUkwUCKF9w1eo20hEDlxBUTHXPDmPN5d+xa1n9ufCkY0jDECBUErDV4hITRUWFfPzafN5fclGfv/9flx0VI94l1QtCoSQhsAWkZooLCrm50/N55VFG7j59MO55Jie8S6p2mIKBDM71cw+NbMVZnZTBY9fb2ZLzWyhmb1jZt3D5YPM7CMzWxI+dl7Uc3qa2UwzW25mT5lZWu3tVvWVTJKj4StEpLqKip1fPLOAlxdu4Dff7ctlxx4S75IOSJWBYGbJwP3AaUA/YLSZ9Su32jwg290HAs8Cd4bL9wAXuXt/4FTgHjNrFT52B3C3u/cGtgLja7ozNaEWgogciKJi55fPLODF+eu58dS+XPHtQ+Nd0gGLpYUwHFjh7ivdPR+YBpwZvYK7v+fue8K7M4Au4fLP3H15eHs9sAlob2YGnEgQHgCPAWfVdGdqQscQRKS6ioqdXz27kBfmreOG7/ThquM
xhAbIHQGVgTdX9tuKwy44HXyi80s+FAGvA50BbY5u4l/TOVbtPMrjCz2WY2OycnJ4ZyD4wmyRGR6igudm56biHPzV3L9Scfxk9P6BXvkmoslkCwCpZVePWWmY0FsoG7yi3vCDwBXOLuxdXZprs/5O7Z7p7dvn3dTTrdLCWZ9NQktRBEpErFxc5vXljEM3PW8rOTenPtSb3jXVKtiGW007VA9PXWXYD15Vcys1HAb4Hj3D0vankEeAW42d1nhIs3A63MLCVsJVS4zfoWSU/V8BUisl/Fxc7NLy5m2qw1XHNiL34+qmmEAcTWQpgF9A7PCkoDzgemR69gZoOBB4Ez3H1T1PI04AXgcXd/pmS5B+NDvAecEy4aB7xYkx2pDSXDV4iIVMTduWX6Yp6cuZqfHH8o1598GMEh0aahykAIv8FfDbwBfAI87e5LzOxWMzsjXO0uIBN4xszmm1lJYJwLfBu4OFw+38wGhY/dCFxvZisIjik8XHu7dWAi6Sk67VREKuTu/GH6EibPWM2Vxx3CDd/p06TCAGKcIMfdXwVeLbfslqjboyp53mRgciWPrSQ4g6nBiGSksmV3frzLEJEGxt259eWlPPbRl1x+bE9uOrVvkwsD0JXKZWjWNBEpz925/ZVPeOS/q7j0mJ785ruHN8kwAAVCGcG8yuoyEpGAu/OX15Yx8cMvuPjoHvzue003DECBUEZJC0FzIoiIu3PnG5/y4Acrueio7vz++/2adBiAAqGMSEYqhcXO3oKieJciInHk7vx
37GA+9/zpgR3fjjGf2bfBiAAqGMfcNXqNtIJJHd/fZy7ntvBaOHd+W2MwckRBiAAqEMTZIjIn9/ezn3vrOc87K7cvtZR5CUlBhhAAqEMrIyNMCdSCK7793l3P32Z5wztAt/PjuxwgAUCGWUdBlp+AqRxPPP91fw1zc/4+zBnbnjhwMTLgxAgVCG5kQQSUwP/vtz7nz9U84a1Im7fnQkyQkYBqBAKEOzpokknon/WcmfX1vG94/sxF8TOAxAgVBGS02SI5JQHv7wC/70yiecPrAjd597JCnJif2RmNh7X05aShIZqcnqMhJJAI/+9wtue3kppw04mHvOG5TwYQAKhG+IZGjEU5Gm7omPVvGHl5bynf4Hce/owaQqDAAFwjdE0jUngkhTNmXml/zuxSWc3O8g/jF6iMIgin4T5WiSHJGma+rHq/ntC4s5qW8H7r9gCGkp+giMpt9GOZokR6RpenrWGn79/CJO6NOef45VGFREv5Fy1EIQaXqemb2GG59fyHGHteeBsUNplpIc75IaJAVCOZokR6RpeW7OWn713EK+1asdD144lPRUhUFlFAjlZGWksiO3UHMiiDQB/zdvHb98dgFHH9qWCRdlKwyqoEAoJ5KRQlGxsztfcyKINGYvzl/H9U/PZ2TPtky8aJjCIAYKhHIiulpZpNF7eeF6rntqPsN6tOHhi7PJSFMYxEKBUI4GuBNp3F5dtIGfTZtPdvc2TLp4GM3TUuJdUqOhQChHs6aJNF6vL97ItVPnMbhrKyZdMowWzRQG1aFAKKd01jR1GYk0Km8u2cjVT85lYJcsHrlkGJkKg2pTIJRT2kJQl5FIo/H20q/46ZNz6d85i0cvHV46crFUjwKhnIim0RRpVN5btomfTJnL4R0jPH7p8NIvdVJ9CoRyWpZMkpOrYwgiDd37n27iyifmcNjBmTxx6YjSedHlwCgQyklNTqJ5WrJaCCIN3Aef5XDFE3Po1SGTyeNHkNVcYVBTCoQKaAhskYbtw+Wbufzx2RzaPpMpl42gVfO0eJfUJCgQKpCVkarTTkUaqP+t2Mxlj8+iZ7sWTLlsBK1bKAxqiwKhApGMFLary0ikwZmx8msufWwW3do0Z8plI2ijMKhVCoQKqMtIpOH5+IstXPLILLq0bs6Uy0bSNrNZvEtqchQIFdCcCCINy+xVW7j4kY/p1CqdJy8fQfuWCoO6oECogGZNE2k45ny5lXGTPubgSDpTLx9Jh5bp8S6pyVIgVCCSkcrO3AKKizUngkg8zVsdhEH7ls148vKRdIgoDOpSTIFgZqea2admtsLMbqrg8evNbKmZLTSzd8yse9Rjr5vZNjN7udxzHjWzL8xsfvgzqOa7Uzsi6akUO+zOVytBJF4WrNnGRQ9/TNvMNKZeMZKDsxQGda3KQDCzZOB+4DSgHzDazPqVW20ekO3uA4FngTujHrsLuLCSzd/g7oPCn/nV
6OlA5wp6uVReJi0drtXPjwTFq1SGXq5SPpmJUR75ISQiwthOHACndf6e75wDTgzOgV3P09d98T3p0BdIl67B1gZy3VWy80SY5I/Cxet52xD88kkhGEQadWCoP6EksgdAbWRN1fGy6rzHjgtRhf
awm+luM6vwtAEzu8LMZpvZ7JycnBg3WzMa4E4kPpau38HYh2eS2SyFqZePpEvr5vEuKaHEEghWwbIKj7aa2Vggm6CbqCq/BvoCw4A2wI0VreTuD7l7trtnt2/fPobN1ty+IbDVZSRSXz7ZsIMxE2fQPDWZqZePpGsbhUF9iyUQ1gJdo+53AdaXX8nMRgG/Bc5w97yqNuruGzyQBzxC0DXVIGSphSBSrz7duJMxE2fSLCWZqVeMpFtbhUE8xBIIs4DeZtbTzNKA84Hp0SuY2WDgQYIw2BTLC5tZx/BfA84CFlen8LpUclBZw1eI1L3lX+3kggkzSE02pl4xku5tW8S7pIRV5Rxz7l5oZlcDbwDJwCR3X2JmtwKz3X06QRdRJvBM8PnOanc/A8DM/kPQNZRpZmuB8e7+BjDFzNoTdEnNB35c+7t3YEqm3tPVyiJ1a8WmXYyeMJOkJOPJy0fSs53CIJ5imnTU3V8FXi237Jao26P289xjK1l+Yow11ruU5CQym+lqZZG69HnOLkZPmAHA1MtHcmj7zDhXJLpSuRKR9BS1EETqyBebdzP6oRm4O1MvH0GvDgqDhkCBUIlIRqoOKovUgVVhGBQVO1MuG0nvg1rGuyQJxdRllIg0BLZI7Vv99R5GT5hBflExT14+gj4HKwwaErUQKhHJ0DEEkdq0ZksQBnsLipg8fgR9D47EuyQpR4FQCbUQRGrP2q17OP+hGezKK2TKZSPo10lh0BApECqhYwgitWPdtr2MnjCDnbkFTLlsBP07ZcW7JKmEAqESkfQUduYVak4EkRrYsH0vox+awbY9BUy+bAQDOisMGjIFQiUiGam4wy7NiSByQDZuz2X0QzPYujufJ8aPYGCXVvEuSaqgQKhEyYin2/eo20ikujbtyOWCCTPYvCufx8YPZ1BXhUFjoECoxL4RTxUIItWxaWcu50+YwVc7cnns0mEM6dY63iVJjHQdQiVKZ03TqaciMcvZmccFE2aycXsuj106nKHd28S7JKkGtRAqoRaCSPVs3pXHmIkzWLd1L49cPIxhPRQGjY0CoRKaE0Ekdlt25zN24kxWb9nDpIuHMeKQtvEuSQ6AAqESmjVNJDZbd+dzwYQZfLF5Nw+PG8ZRhyoMGisdQ6hEZnrJMQS1EEQqs21PPmMmzmTl5t08PC6bY3q1i3dJUgNqIVQiOclo2UxDYItUZvueAsY+PJMVObuYcFE2x/aunznPpe4oEPYjGL5CXUYi5W3fW8CFk2by2cZdPHjhUI47TGHQFCgQ9qOlJskR+YYduQVcNOljPtmwgwfGDuGEPh3iXZLUEgXCfmiAuwO3Yftepi9YzycbdlBQVBzvcqSW7MwtYNykj1m6fjv/HDOUkw4/KN4lSS3SQeX9yMpIZc2WPfEuo9H54LMcrp02j23hsB9pKUn0Oagl/TtF6N8pQr9OWRzesSXN0/Tn15jsyivk4kdmsWjtdu4fM4ST+ykMmhr9j9yPSHoqO3XaaczcnQf+/Tl/feNTDjuoJQ+OHcrGHbksXb+DJet38MaSjUybtQaAJIOe7VrQv1NWGBTBv61bpMV5L6Qiu/MKueSRj5m/Zhv3jR7Md/ofHO+SpA4oEPYjmDVNXUax2JVXyC+fXsDrSzby/SM7cccPjyhtAZw5qDMQBMaG7bksWb+DJeu3s2T9DuZ8uZXpC9aXbqdTVjr9SkMiQv/OWXTKSsfM4rJfAnvyC7nk0VnMXb2Ne88fzGlHdIx3SVJHFAj7EUlPZWdeIUXFTnKSPpAq83nOLq58Yg5fbN7Nzacfzvhv9azwA9zM6NQqg06tMsp0N2zdnc/SDftCYsn6Hby77CtKpqJo3TyVflGtiP6dIvRsl6n3pB7szS9i/KOzmb1qC38/fzCnD1QYNGUKhP0oGQJ7V24hWc1T41xNw/Tmko384ukFpKUk8cT44Rx9aPUvTGrdIo1jerUrc1HTnvxClm3cyZL1O1gaBsWj/1tFfmFwgDojNZm+HVuW6W467KCWpKcm19q+J
cgiIue3wWM7/4mrvPG8T3j+wU75KkjikQ9iNScrVyboECoZyiYueetz/jH++uYGCXLP41diidWmXU2vabp6UwpFvrMkMnFxQV83nOLpas21Ha7fTivPVMnrEagJQko1eHzDKtiX6dIqXDkEjscguKuPzx2fzv86/527lHlnb7SdOmQNiP0kly9hbQNc61NCTb9xTws6fm8f6nOZyb3YVbzxxQL9/MU5OT6HtwhL4HR/jh0GCZu7Nmy96o7qbtfLh8M8/PXVf6vG5tmu87JhEGRYdIep3X21jlFhRxxRNz+HDFZu4650h+MLhLvEuSeqJA2A8Ngf1Nyzbu4Mon5rB+215u/8EALhjeLa4HfM2Mbm2b061t8zIHO3N25pWGxNIwKF5bvLH08XaZzb4REt3aNCcpwY9L5BUWcdXkOXzwWQ53/nAg5wxVGCQSBcJ+aJKcsqYvWM+Nzy4kkpHCtCuOYmj3hjsTVvuWzTi+TweOj7qKdmduAZ9s2Fnm4PV/P1hJYXj0OrNZCv06RsIupyAoeh+USWpyYly/mVdYxE8mz+W9T3P4y9lHcO4wtYsTjQJhP9RCCBQWFfOX15Yx8cMvGNajNfePGUKHlo2vy6VleirDe7ZheM99E7fkFRax/KtdZULi6dlr2JNfBEBachKHHZxJ/45Z9O8cBEXfgyO0aNa0/uvkFxbz0ynzeGfZJm7/wQDOH94t3iVJHDStv+paFtEkOXy9K4+rn5zHRyu/5uKje/Cb7x5OWkrT+cbcLCWZAZ2zGNA5q3RZUbGz6uvdpccklq7fwVuffMVTs4OL6uwbF9UFrYk2jfSiuoKiYq6ZOpe3P/mK287sz5gR3eNdksSJAmE/WjZLwSxxA2HBmm1cNXkOX+/O52/nHsnZQxKjPzk5yTi0fSaHts/kjPBUS3dn447cMmc4zf1yKy9FXVTXMSu9dGiOkqDo3CqjQV9UV1BUzLVT5/HGkq/44xn9ufCoHvEuSeJIgbAfSaVzIiTeMYSnZ63h5hcX0z6zGc9ddXSZb9CJyMzomJVBx6wMRkVdVLdtT37p0Bwl3U7vLttUelFdVkbqNw5eH9K+YVxUV1hUzM+nzee1xRu55Xv9GHd0j3iXJHGmQKhCoo14mldYxB9fWsqTM1fzrV7t+MfowRpfaD9aNU/j6F7tODrqorq9+UUs27ij9JjE0vXbeeyjL0svqktPDU6fjQ6JPgfX70V1hUXFXPf0Al5ZtIGbTz+cS7/Vs95eWxouBUIVIumpCXNQeeP2XK6aMod5q7fx4+MO5Ybv9GkQ32Qbm4y0ZAZ3a83gqIvqCouK+Txnd5nrJaYvWM+UmcFFdclJRq/2maUX0/XvlEW/ThGyMm
orqiYucXzyzgpQXr+fVpfbns2ENq/TWkcVIgVCEY4K7pdxl9/MUWfjJlLnvyC3lgzBANYFbLUpKT6HNwS/oc3JKzhwTL3J21W/eWPQ328808P2/fRXVd22QEZzh1ioRnOWXRoWWzAz4uUVTs3PDMAl6cv55fndqHK487tDZ2T5qImALBzE4F/g4kAxPd/S/lHr8euAwoBHKAS939y/Cx14GRwIfu
2o5/QEpgFtgLnAhe6eX+M9qmWR9FRWN+E5Edydx/63ij+98gld2zRn6uUj6H1Qy3iXlRDMjK5tmtO1TXNOHbAvgDfvyitzTGLp+h28viT6orq0siPCdsqiewwX1RUXOzc+t5Dn563jl6ccxk+O71Vn+yaNU5WBYGbJwP3AycBaYJaZTXf3pVGrzQOy3X2PmV0F3AmcFz52F9AcuLLcpu8A7nb3aWb2L2A88ECN9qYONOVjCLkFRfzm+UU8P28dow7vwN/OG6RxfxqAdpnNOO6w9mXmKd6VV8gnG3awZN2+1sTE/6ykoGjfRXWHd2xZ2tXUv1OE3h1alp4iXFzs3PT8Qp6ds5
Rh3G1Sf2jsu+ScMWSwthOLDC3VcCmNk04EygNBDc
2o9WcAY6Mee8fMjo/eoAXt3ROBC8JFjwF/oCEGQnpqkzzLaM2WPfx48hyWbtjB9ScfxtUn9Er4YRsassxmKQzr0YZhPb55Ud3SqNZE9EV1qcnGYQe1pF/HCLvyCnlt8UauPak3PxulMJCKxRIInYE1UffXAiP2s/544LUqttkW2ObuJZ+0a8PX+QYzuwK4AqBbt/q/ejKSkcKuvEIKi4pJaSJDGPxneQ7XTJ1HUbEzadwwTuirSdIbo7IX1QXDTBSXuaguCIp3l21iy558rjmxF9cpDGQ/YgmEir42eoUrmo0FsoHjamub7v4Q8BBAdnZ2hevUpZIulF15hbRq3rhPv3R3/vXvldz1xjJ6d2jJgxcOpUe7FvEuS2pRUpJxSPtMDmmfWTp/gbuzt6BIc1hLlWL5C1kLZUZ/7gKsL7+SmY0Cfgsc5+55VWxzM9DKzFLCVkKF22wI9g1f0bgDYVdeITc8s4DXFm/k9IEdufOHA5vceDxSMTNTGEhMYvkrmQX0Ds8KWgecz76+fwDMbDDwIHCqu2+qaoPu7mb2HnAOwZlG44AXq1l7vSiZJGd7Iz6wvDKc4vLznF389ruHc9mxFU9xKSKJrcpAcPdCM7saeIPgtNNJ7r7EzG4FZrv7dIIziTKBZ8IPmtXufgaAmf0H6AtkmtlaYLy7vwHcCEwzsz8RnKX0cO3vXs2VXBjUWC9Oe2vpV1z/1HxSU5KYPH5EmStqRUSixdSOdPdXgVfLLbsl6vao/Tz32EqWryQ4g6lBa6wjnhYXO/e8s5x731nOEZ2z+NeFQ+lci1NcikjTo47FKkQaYQth+54Cfv7UPN77NIdzhnbhT2fVzxSXItK4KRCqUHIMobEMXxE9xeVtZw1g7Ij4TnEpIo2HAqEKLdJSSLLG0UJ4acF6fvXsQlqmpzDtipEM7d6m6ieJiIQUCFVISjJapjfs4SsKi4q54/VlTPjPF2R3b80/xwyhQ6TxTXEpIvGlQIhBJKPhTpITPcXlRUd15+bT+zWpKS5FpP4oEGIQaaAthIVrt/HjJ4IpLv/6oyM5Z2hiTHEpInVDgRCDhjhJztOz13Dz/2mKSxGpPQqEGEQyUli1uWHMiZBfWMwfX1rClJmrOaZXW/4xeghtNMWliNQCBUIMGkoL4asduVw1eQ5zV2/jyuMO4YZT+jSZEVhFJP4UCDHIykiN+1hGs1YFU1zuzivk/guGcPpATXEpIrVLgRCDSEYqe/KLKCgqJrWev5G7O0/M+JJbX1pKl9YZTB4/gj4Ha4pLEal9CoQYlFytvDO3sF7763MLivjNC4t4fu46TuobTHFZMtieiEhtUyDEIHqAu/oKhLVbgykuF6
wc9H9ebaE3triksRqVMKhBiUzJpWXweWP1y+mWumzqWw2Hl4XDYnHX5QvbyuiCQ2BUIMomdNq2sffJbDxY98TK8OmTx4YTY9NcWliNQTBUIMIhnhiKd13ELI2ZnH9U8voFeHTJ7/yTFkaopLEalH+sSJQWmXUR2eelpc7PzymQXszC1gymUjFAYiUu90VVMM6mOSnEn
YJ/f5bDzd
p9NKRSQuFAgxaJGWHMyJUEfHEBav284dry/jlH4HMXZEtzp5DRGRqigQYmBmRDLqZviK3XmFXDN1Hu0ym3HnOQM1u5mIxI06qmNUV8NX/H76Er78ejdPXj6SVs01SJ2IxI9aCDGqizkRXpy/jmeUv2nrAAAPGUlEQVTnrOXqE3sz8pC2tbptEZHqUiDEqLZnTVv99R5++8Jisru35toTe9XadkVEDpQCIUa12UIoKCrmmmnzMIN7zh+kIaxFpEHQMYQY1eacCHe/9RkL1mzj/guG0KV181rZpohITemraYwiGSm1ctrpf1ds5oF/f87o4V01p4GINCgKhBhF0lPZW1BEfmHxAW/j6115XPfUfA5tn8kt3+tfi9WJiNScAiFGJVcr7zzAbiN354ZnF7JtbwH3nj+YjLTk2ixPRKTGFAgx2jfA3YF1Gz36v1W8u2wTvzmtL/06RWqzNBGRWqFAiFFNBrhbsn47f351GSf17cC4o3vUcmUiIrVDgRCjAx3gbk9+IddOnUer5qnc9aMjNTSFiDRYOu00RiUthOoOX3HrS0tZuXk3U8aPqNf5mEVEqksthBhlHcCsaS8vXM+0WWv4yfGHcnSvdnVVmohIrVAgxKi6s6at2bKHXz+/iMHdWvHzUYfVZWkiIrVCgRCjjNRkUpIspoPKhUXF/GzaPHC49/zBpGpoChFpBGL6pDKzU83sUzNbYWY3VfD49Wa21MwWmtk7ZtY96rFxZrY8/BkXtfz9cJvzw58OtbNLdaM6cyL8/Z3lzF29jdvPPoKubTQ0hYg0DlUeVDazZOB+4GRgLTDLzKa7+9Ko1eYB2e6+x8yuAu4EzjOzNsDvgWzAgTnhc7eGzxvj7rNrcX/qVCS96uErPvr8a+57bwU/GtqFM47sVE+ViYjUXCwthOHACndf6e75wDTgzOgV3P09d98T3p0BdAlvfwd4y923hCHwFnBq7ZRe/6pqIWzdnc91T82nZ9sW/OEMDU0hIo1LLIHQGVgTdX9tuKwy44HXYnzuI2F30e+skhP0zewKM5ttZrNzcnJiKLfu7G8IbHfnV88tZMvufO4dPZgWzXRGr4g0LrEEQkUf1F7himZjCbqH7orhuWPc/Qjg2PDnwoq26e4PuXu2u2e3b98+hnLrzv4myZk840veWvoVN57WlwGds+q5MhGRmoslENYCXaPudwHWl1/JzEYBvwXOcPe8qp7r7uvCf3cCTxJ0TTVolbUQlm3cwW2vfMIJfdpz6TE96r8wEZFaEEsgzAJ6m1lPM0sDzgemR69gZoOBBwnCYFPUQ28Ap5hZazNrDZwCvGFmKWbWLnxuKvA9YHHNd6duVXQMYW9+Edc8OY9IuoamEJHGrcqObncvNLOrCT7ck4FJ7r7EzG4FZrv7dIIuokzgmfADcbW7n+HuW8zsNoJQAbg1XNaCIBhSw22+DUyo9b2rZZH0FHILiskrLKJZSjB89W2vLGX5pl08MX447TKbxblCEZEDF9ORT3d/FXi13LJbom6P2s9zJwGTyi3bDQytVqUNQPTwFe1bJvP64g08OXM1Vx53CMf2ju/xDRGRmtKpMNUQPeJpflExv3p2IUd2yeIXJ/eJc2UiIjWnQKiGkhFPt+7O587XP6XY4d7Rg0lL0dAUItL4KRCqoWSAuztf/5SPV23hnvMG0b1tizhXJSJSO/TVthpKWggfr9rC2UM6c9bg/V2fJyLSuCgQqqHkGEKPts259cwBca5GRKR2qcuoGjq0bMZVxx/KWYM6k6mhKUSkidGnWjWYGTee2jfeZYiI1Al1GYmICKBAEBGRkAJBREQABYKIiIQUCCIiAigQREQkpEAQERFAgSAiIiFzr3B65AbJzHKAL6tYrR2wuR7KaWi034lF+51Ya
f3d29yklbGlUgxMLMZrt7drzrqG/a78Si/U4s9bXf6jISERFAgSAiIqGmGAgPxbuAONF+Jxbtd2Kpl/1ucscQRETkwDTFFoKIiByAJhMIZnaqmX1qZivM7KZ411NXzKyrmb1nZp+Y2RIz+1m4vI2ZvWVmy8N/W8e71rpgZslmNs/MXg7v9zSzmeF+P2VmafGusS6YWSsze9bMloXv/VGJ8J6b2XXh3/liM5tqZulN8T03s0lmtsnMFkctq/D9tcC94WfdQjMbUlt1NIlAMLNk4H7gNKAfMNrM+sW3qjpTCPzC3Q8HRgI/Dff1JuAdd+8NvBPeb4p+BnwSdf8O4O5wv7cC4+NSVd37O/C6u/cFjiT4HTTp99zMOgPXAtnuPgBIBs6nab7njwKnlltW2ft7GtA7/LkCeKC2imgSgQAMB1a4+0p3zwemAWfGuaY64e4b3H1ueHsnwQdDZ4L9fSxc7THgrPhUWHfMrAtwOjAxvG/AicCz4SpNdb8jwLeBhwHcPd/dt5EA7znBrI4ZZpYCNAc20ATfc3f/ANhSbnFl7++ZwOMemAG0MrOOtVFHUwmEzsCaqPtrw2VNmpn1AAYDM4GD3H0DBKEBdIhfZXXmHuBXQHF4vy2wzd0Lw/tN9X0/BMgBHgm7yyaaWQua+Hvu7uuAvwKrCYJgOzCHxHjPofL3t84+75pKIFgFy5r06VNmlgk8B/zc3XfEu566ZmbfAza5+5zoxRWs2hTf9xRgCPCAuw8GdtPEuocqEvaZnwn0BDoBLQi6S8priu/5/tTZ331TCYS1QNeo+12A9XGqpc6ZWSpBGExx9+fDxV+VNBvDfzfFq746cgxwhpmtIugSPJGgxdAq7E6Apvu+rwXWuvvM8P6zBAHR1N/zUcAX7p7j7gXA88DRJMZ7DpW/v3X2eddUAmEW0Ds8+yCN4MDT9DjXVCfCfvOHgU/c/W9RD00HxoW3xwEv1ndtdcndf+3uXdy9B8H7+667jwHeA84JV2ty+w3g7huBNWbWJ1x0ErCUJv6eE3QVjTSz5uHffcl+N/n3PFTZ+zsduCg822gksL2ka6mmmsyFaWb2XYJvjMnAJHe/Pc4l1Qkz+xbwH2AR+
Sf0NwHOFpoBvBf6QfuXv5g1RNgpkdD/zS3b9nZocQtBjaAPOAse6eF8/66oKZDSI4mJ4GrAQuIfhC16TfczP7I3Aewdl184DLCPrLm9R7bmZTgeMJRjX9Cvg98H9U8P6G4XgfwVlJe4BL3H12rdTRVAJBRERqpql0GYmISA0pEEREBFAgiIhISIEgIiKAAkFEREIKhARiZm5mT0TdTzGznJKRQ/fzvEHhab2VPZ5tZvfWsLb24QiW88zs2JpsK9xej5KRI6PrM7NmZva2mc03s/PM7NhwNM35ZpZR09fdTz3Hm9nRdbX9Sl5zYrwGeQz3t8K/KzN71cxa1XdNUrWUqleRJmQ3MMDMMtx9L3AysC6G5w0CsoFXyz9gZinhOdA1PQ/6JGCZu4+rcs19r53s7kVVrVeuvsFAqrsPCrfxL+Cv7v5IjK9pBKdrF1e5clnHA7uA/1XzeQfM3S+
9eqDnev9MuFxJdaCInnNYIRQwFGA1NLHjCzFuG47LPCb+pnhld+3wqcF/Wt+g9m9pCZvQk8Hv1t0MwyzewRM1sUjtX+QwvmMHjUgjHtF5nZddEFhRdd3Ql8t+SbupmNDtddbGZ3RK27y8xuNbOZwFHltjPUzBaY2UfAT6OWH29mL5tZB2AyMCh8nSuBc4FbzGxKuO4N4f4vDC+KKmltfGJm/wTmAl3N7BQz+8jM5prZM+HYUpjZKjP7Y7h8kZn1tWAQwh8D14WvW6YFFP4+HzOzN8Pnn21md4bPf92CoUows1vC2haHv38LW3mzLLhYDzP7s5ndHt5+38yyo35vd5jZnLCFNDx8fKWZnRGuc7GZ3RdV18tR263y+RWImNkLZ
UzP5lZklRv6N2U
XCRa00t60sJVmZteGz1toZtMq2b7UNnfXT4L8EHxDHUgwFk46MJ/gm+vL4eP/j+CqT4BWwGcEA4pdDNwXtZ0/EIw6mRHej97GHcA9Ueu2BoYCb0Uta1VBbaWvQTCQ2WqgPUEr9l3grPAxB86tZP8WAseFt+8CFldQX+nt8P6jwDnh7VMI5q41gi9LLxMMO92D4KrwkeF67YAPgBbh/RuBW8Lbq4Brwts/ASZG/c5+WUndfwA+BFIJ5jrYA5wWPvZC1L63iXrOE8D3w9v9CYZBP5ngyt20cPn7BHMJlPzeorf5ZtTrzS
HoT3XwaOj/X55fbpeCCXYKTWZOCtqN/zqvB32IPgCuRB4fKn2ff3tx5oVtnfi37q5kcthATj7gsJ/iOO5ptdQKcAN5nZfIIPk3SCy+YrMt2DbqfyRhFMVlTyelsJhlo4xMz+YWanAlWNzjoMeN+DQc0KgSkEH8wARQQD+5VhZlkEHxz/Dhc9UX6dGJwS/swjaAn0JZiEBOBLD8aeh2Bion7Af8Pf1Tige9R2SgYcnEPwu47Fax4M4LaI4AP09XD5oqhtnGDBcZZFBIP79Qdw9yUE+/sScKkHc4KUl19um/+Oer1YajyQ53/swRwlRQQt0W9VsM4X7j4/vB39+1oITDGzsQShIfVAxxAS03SCceaPJ5hToIQBP3T3T6NXNrMRFWxjdyXbNsoNxevuW83sSOA7BF055wKX7qe+iob3LZHrFR83+M
HgAD/uzuD5ZZGHT57C633lvuPrqS7ZSMq1NE7P/H8gDcvdjMCjz8akzQMkkxs3TgnwTf+NeY2R8IArvEEcA24KBKtl9+m9GvV1JjIWW7kdOr+fzyyr8fFb0/0WMQFQElB/ZPJ/gScAbwOzPr7/vmQJA6ohZCYpoE3Orui8otfwO4xswMwMwGh8t3Ai1j3PabwNUld8ystZm1A5Lc/TngdwRDN+/PTOC4sJ85maA18+/9PcGDGcS2WzD4H8CYGOuN9gZwadTxgM7hcYfyZgDHmFmvcL3mZnZYFduuzu+wIiUfzpvD+kpG+8TMziYI9m8D99qBn8GziuD4SpKZdSWYibAmhlswAnESwQB1H8bypHD9ru7+HsGESK2AzBrWIjFQICQgd1
7n+v4KHbCPqFF1pwyuZt4fL3gH7hAdHzqtj8n4DW4YHPBcAJBKNTvh92rzwK/LqK+jaE67wHLADmunssQxxfAtxvwUHlirqz9svd3wSeBD4Ku2WepYIPcXfPIehvn2pmCwkCom8Vm38J+EFFB5VjrG0bMIGgi+
CIZ8JwzbvwDj3f0zglEwK3pvY/Ff4IvwNf5K0G1WEx+FtS0Ot/tCjM9LBiaH78E8gvmTt9WwFomBRjsVERFALQQREQkpEEREBFAgiIhISIEgIiKAAkFEREIKBBERARQIIiISUiCIiAgA/z8DU+h5A0EqJgAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, 5, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient BOOSTED TREE"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel\n"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [],
"source": [
"def extract_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"data_gbt = records.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [],
"source": [
"(traindata, testData) = data_gbt.randomSplit([0.7, 0.3])"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gradient BOOSTED predictions: [(307000.0, 265005.81171157816), (118000.0, 133611.14615160643), (279500.0, 188655.2935917073), (149000.0, 126758.74131085054), (139000.0, 129323.12870531864)]\n"
]
}
],
"source": [
"model = GradientBoostedTrees.trainRegressor(traindata,\n",
" categoricalFeaturesInfo={}, numIterations=3)\n",
"preds = model.predict(testData.map(lambda p: p.features))\n",
"actual = testData.map(lambda p: p.label)\n",
"true_vs_predicted_GBT = actual.zip(preds)\n",
"print (\"Gradient BOOSTED predictions: \" + str(true_vs_predicted_GBT.take(5)))\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"413\n",
"log - Mean Squared E
or: 1852793539.9857\n",
"log - Mean Absolue E
or: 29263.6311\n",
"Root Mean Squared Log E
or: 0.2392\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_GBT.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared E
or: %2.4f\" % t)\n",
"print(\"log - Mean Absolue E
or: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log E
or: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(traindata,categoricalFeaturesInfo, loss, numIterations, maxDepth, maxBins):\n",
"\n",
" model = GradientBoostedTrees.trainRegressor(trainingData,categoricalFeaturesInfo, loss,numIterations,maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(testData.map(lambda p: p.features))\n",
"\n",
" actual = testData.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient boost tree Iteration"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.25905666523741905, 0.2590563768733536, 0.25905580014870655, 0.25905464671334816, 0.259052339898376, 0.2590477264914201, 0.25904253676400585]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAEVCAYAAADD3MPgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VdW5
HPkxEIcwiITGGUwQmJU6syaBW9FrTiFeqALS21am2
dXbXm+
X9F20tvq9SitFVbBbS24lCtCjgiEFRABCQMyiSEeSYkeX5/nBU8xHOSE0hyMnzfr1de7LP2Ws9a++SQJ2vtnb3N3REREakuKckegIiINCxKLCIiUq2UWEREpFopsYiISLVSYhERkWqlxCIiItVKiSUGM7vPzJaZ2SIz+4eZtY5Tb42ZLTaz980sP6r8FDObE/Y9a2Ytq2FMN5tZgZm5mbU71ngiIjWl0ScWMxtiZn8pV/wycKK7nwx8BNxZQYih7n6qu+dFlT0M3OHuJwH/AH5YDUN9C7gA+LgaYomI1JhGn1hicfd/u3txePkO0LmKIU4AXg
LwNXAJhZapgNzQ+zoW9VYUzvufuaKo5DRKTWKbFU7uvAv+Lsc+DfZ
AzMZHlX8AjAjbVwJdwvY4YKe7nw6cDnzTzLrXwJhFRJImLdkDSBYzmwtkAs2Btmb2fth1u7u/FOr8GCgG/hYnzBfdfYOZtQdeNrNl7v46kWT0OzO7C5gBFIX6FwInm9mo8LoV0NvMPgEWxunja+4+/+iPVESkdjXaxOLuZ0LkHAtwvbtfH73fzMYClwLne5wbqrn7hvDvZjP7B3AG8Lq7LyOSRDCzPsB/lIUFvlOWuMo58ViPSUSkLtBSWAxmNhy4HRjh7vvi1MkysxZl20QSyQfhdfvwbwrwE+DB0Owl4Ntmlh729wltRUQaDCWW2O4HWhBZ3nrfzB4EMLPjzeyFUKcD8KaZLQTmAc+7+4th3xgz+whYBmwA/hzKHwY+BN41sw+AP5LgrNHMbjGzdUQuJFhkZg8f81GKiNQA023zRUSkOmnGIiIi1UqJRUREqlWjvCqsXbt2npubm+xhiIjUKwsWLNji7jmV1WuUiSU3N5f8/PzKK4qIyGFmltAtpbQUJiIi1UqJRUREqpUSi4iIVCslFhERqVZKLCIiUq2UWEREpFo1ysuNj1ZxSSkvf7iJ1s0yaJOVTttmGbRulkFGmvKziEgZJZYq2L7vEN/+27ufK8/KSD2cbNqEZNO2WXqkrFk6
IyPttulkG
AyyMlIxsyQchYhIzVJiqYLWzdJ54ZZz2bGviO37DrFtXxE79ka2I2WR7bXb9rFtbxG7DhTHjZWeaoeTTXTSad0sg7ZZZWVH7m/dLIPUFCUjEanblFiqID01hf7Ht0y4fnFJKTv3H4pKPIciySdGMlq9ZS/v7tvB9r1FFJfGv+N0q6bpMZNR2czoiGQUZlBN0lOr4/BFRBKixFKD0lJTyG6eSXbzzITbuDt7DhazoywJlSWgvUVsi0pQO/YVUbjnIB9t2sOOfUXsLSqJG7NJesrh80FtsmInpbZZGXRt24wubZuRnqpzRiJy9JRY6hgzo0WTdFo0SadL22YJtztYXPJZMtp75Axpx74itu39bIa0cceuSPn+Q5R/HE9qitG1bTN6tMuie7ssuudk0aNdc3rkZNG+RabOC4lIpZRYGojMtFQ6tEylQ8smCbcpLXV2HTjEtr1FbNtbxJqt+1i9ZQ+rt+xlVeFe3izYwsHi0sP1szJS6Z6TRfd2zeneLoueOSH5tMuiRZP0mjgsEamHEn0s7nDg/4BU4GF3/1W5/ZnAo8AgYCtwlbuvCfvuBMYBJcAt7v5SRTHNrDswFWgLvAtc6+5FZnY9cB+wPnR7v7s/HNp0JfLY3y6AA5eU9S/xpaRELiBo3SyDHjmQl9v2iP2lpc7GXQdYXbiXVVv2sKpwL6u37OX9tdt5btGGI2Y77Zpn0iMn6/BMp0dOJPl0bdtMl2OLNDKVJhYzSwUeAL4ErAPmm9kMd/8wqto4YLu79zKz0cAE4Coz6w+MBgYAxwOvmFmf0CZezAnARHefGp41Pw74Q2gzzd1vjjHMR4FfuPvLZtYcKI1RR6ooJcXo1LopnVo35Zze7Y7Yd+BQCWu37WNlSDarQ+J5+cNNbN1bdLheaorRpU3TMLNp/lnyycniuJZNtLQm0gAlMmM5Ayhw91UAZjYVGAlEJ5aRwE/D9lPA/Rb5iTESmOruB4HVZlYQ4hErppktBYYBXw11HglxyxLL54TklebuLwO4+54EjkmOUZP0VHp3aEHvDi0+t2/nvkOsCktqq7fsZVVYWpuzaisHDn2W85ump0adx8miR9QyW6umWloTqa8SSSydgLVRr9cBZ8ar4+7FZrYTyA7l75Rr2ylsx4qZDexw9+IY9QGuMLPzgI+A77v7WqAPsMPMnga6A68Ad7h7/MukpEa1apbOwK5tGNi1zRHlpaXOpt0HWFUYSTZlS2wfrN/JvxZvpPSIpbWMw+dvypbVerTLomt2MzLTdPm0SF2WSGKJtVZR/g8t4tWJVx5r0b2i+gDPAk+4+0Ezu4HIbGYYkWM4FxgIfAJMA64HphwxQLPxwHiArl27xuhGalpKitGxVVM6tmrKF3sdubRWVFzKJ9v2Hj6PU
vzGWFTM9f91kMg85tmh1OOj3LZjk5WXRs2YQU/QGpSNIlkljWETkpXqYzsCFOnXVmlga0ArZV0jZW+RagtZmlhVnL4fruvjWq/kNEzsWU9f1e1LLaP4GzKJdY3H0yMBkgLy8v/l8gSlJkpKXQq30LerX
NLargOHWF0YvawWWWabv2Yb+6L+fqdJegq52VnhPE64cq19c048viVp+tsckVqTSGKZD/QOV2utJ3Iy/qvl6swAxgJzgFHATHd3M5sBPG5m/0vk5H1vYB6RmcnnYoY2s0KMqSHmMwBm1tHdN4b+RgBLo8bXxsxy3L2QyCxGD7RvQFo2SeeULq05pUvrI8rdnU27Dh4+n1M2y1m6cTcvLdlESVhba9kkjSEntOf8fu0Z0qc9rZrp/I1ITao0sYRzJjcDLxG5NPhP7r7EzO4G8t19BpHZwWPh5Pw2IomCUG86kRP9xcBNZec+YsUMXd4OTDWznwPv8dnM4xYzGxHibCOy3IW7l5jZD4BXwwUDC4jMaKSBMzOOa9WE41o14Qs9P7+0tnb7PpZu3MXs5YXMWraZGQs3kJpi5HVrwwX9OjCsX3t65jRP0uhFGi7z8n963Qjk5eV5fr4mNY1JSanz/todzFy2iVeXbmbZp7sB6N4ui2F9I7OZ03Pb6nY2IhUwswXunldpPSUWaYzWbd/HzGWbeXXpZuas3EpRSSktmqQxuE8OF/TrwOA+ObTJykj2MEXqFCWWCiixSLS9B4t5Y8UWZi7bxMxlm9myp4gUg7xubTm/X2Q20zOnuf6YUxo9JZYKKLFIPKWlzsJ1O5i5bDOvLN3M0o27AOiW3Yzz+3Y4vGSm29RIY6TEUgElFknUhh37eXXZZmYu3cRbK7dSVFxKi8w0zuuTE7nK7IT2tNWSmTQSSiwVUGKRo7GvqJg3V2yJnJtZtpnC3QdJMTitaxvO7xeZzfRuryUzabiUWCqgxCLHqrTU+WDDTl5ZuplXl25iyYbIklmXtk0PL5md2T1bS2bSoCixVECJRa
pzsP8OqyTcxcuvnwc2yaZ6Zxbu92nN+vA0NPyKnSk0RF6iIllgoosUhN2l9UwlsFWyLnZpZtYtOug5jBwC6tDy+ZndChhZbMpN5RYqmAEovUFndnyYZdvLI0cinzonU7AejUumm4lLkDZ/Voqzs2S72gxFIBJRZJlk27Dhz+w8w3Cwo5cKiUZhmpUUtm7clpoSUzqZuUWCqgxCJ1wYFDJby9cguvLt3MzGWb2bjzAGZwSufWXNCvPcP6dqBfRy2ZSd2hxFIBJRapa9ydDzfu4tVwldnCqCWzYX3bM6xfe87ukU2TdC2ZSfIosVRAiUXqus27DzArLJm9sWIL+w+V0DQ9lXN6t+OCfu0Z2rc97Vs0SfYwpZFRYqmAEovUJwcOlfDOqq2HZzMbdh4gNcW49qxu3HphH1o20fNlpHYosVRAiUXqK3dn2ae7+es7H/P4vE/Izsrkzov78pXTOulcjNS4RBOL/ixYpB4xM/p1bMkvLj+JZ28+hy5tm3Lbkwu58sE5fBj++l8k2ZRYROqpEzu14u83fIF7R53Mqi17ufT3
DTGUvYuf9QsocmjZwSi0g9lpJi/GdeF2bdNoRrzurGo3PWcP5vZvNk/lpKSxvfMrfUDUosIg1Aq2bp3D3yRGbcfA5d2zbjh08tYtSD
PB+p3JHpo0QgklFjM
mbLzazAzO6IsT/TzKaF/XPNLDdq352hfLmZXVRZTDPrHmKsCDEzQvn1ZlZoZu+Hr2+UG0NLM1tvZvdX/W0QaRhO7NSKp274AveNOpmPt+5jxP1vctczH7Bzn5bHpPZUmljMLBV4ALgY6A+MMbP+5aqNA7a7ey9gIjAhtO0PjAYGAMOBSWaWWknMCcBEd+8NbA+xy0xz91PD18PlxnAP8FqCxy3SYKWkGFfmdWHmD4Zw3dm5/PWdjxn2m9lMn6/lMakdicxYzgAK3H2VuxcBU4GR5eqMBB4J208B51vk2seRwFR3P+juq4GCEC9mzNBmWIhBiHlZZQM0s0FAB+DfCRyPSKPQqmk6Px0xgGe/cw7d22Xxo78v4gotj0ktSCSxdALWRr1eF8pi1nH3YmAnkF1B23jl2cCOECNWX1eY2SIze8rMugCYWQrwG+CHCRyLSKMz4PhWPHnD2fzmylNYu20/X77/TX7yz8Xs2FeU7KFJA5VIYon1V1fl59Px6lRXOcCzQK67nwy8wmczpBuBF9x9bYy2nw3QbLyZ5ZtZfmFhYUVVRRocM+OKQZ2Z+YPBXP+FXB6f+wlDfz2bqfM+0fKYVLtEEss6oEvU687Ahnh1zCwNaAVsq6BtvPItQOsQ44i+3H2rux8M5Q8Bg8L22cDNZrYG+DVwnZn9qvxBuPtkd89z97ycnJwEDluk4WnZJJ3/+fIAn
lXHq1b84dTy/m8j+8zaJ1O5I9NGlAEkks84He4WqtDCIn42eUqzMDGBu2RwEzPXKvmBnA6HDVWHegNzAvXszQZlaIQYj5DICZdYzqbwSwFMDdr3b3ru6eC/wAeNTdP3flmoh8pl/Hlkz/1tlMvOoUNuzYz8gH3uK
GY7Xu1PCbHLq2yCu5ebGY3Ay8BqcCf3H2Jmd0N5Lv7DGAK8JiZFRCZqYwObZeY2XTgQ6AYuMndSwBixQxd3g5MNbOfA++F2AC3mNmIEGcbcP0xH71II2ZmXD6wMxf068BvX1nBX95ewwuLN/Kji/py1eldSE3Rvcfk6OgmlCICwLJPd3HXM0uYt3obJ3duxT0jT+SULq2TPSypQ3QTShGpkr7HtWTa+LP4v9Gn8unOA1w26S3ufHoR27Q8JlWkxCIih5kZI0/txKu3DWbcF7szPX8dw34zm7++8zElunpMEqTEIiKf06JJOj+5tD
+u659D2uBT/55wdc9sBbvPfJ9mQPTeoBJRYRiatPhxY88c2z+N2YgWzefYDLJ73N7U8tYuueg5U3lkZLiUVEKmRmjDjleF69bQjjz+vB399dx7DfvMZjc9ZoeUxiUmIRkYQ0z0zjvy7px7++ey4Djm/Jfz+zhJEPvMmCj7U8JkdSYhGRKundoQV/+8aZ/H7MQLbsLuKKP7zND59cyBYtj0mgxCIiVWZmfPmU43n1tsF8a3AP/vHeeob9ejaPztHymCixiMgxyMpM486L+/Hi987j5M6tueuZJXz592+y4ONtyR6aJJESi4gcs17tm/PYuDOYdPVpbN9XxBV/mMNt0xdSuFvLY42REouIVAsz45KTOvLKrYP59pCezFi4nmG/mc2f31pNcUlpsocntUiJRUSqVVZmGrcP78uL3zuPU7u05mfPfsilv3+T+Wu0PNZYKLGISI3omdOcR79+Bg9ecxq79h/iygfncOu099m8+0CyhyY1TIlFRGqMmTH8xI68cttgbhrak+cWbeT8X7/GlDe1PNaQKbGISI1rlpHGDy/qy4vfO5eB3dpwz3OR5bG5q7Yme2hSA5RYRKTW9MhpziNfO50HrxnE7gPFXDX5Hb439T0279LyWEOixCIitSqyPHYcr9w6mO8M68ULiz9l2G9e4+E3VnFIy2MNghKLiCRF04xU
vwBF76/nnk5
h588v5cu/f1Mn9xsAJRYRSaru7bL48/Wn88drB/Hx1n2Mf3QBBw6VJHtYcgwSSixmNtzMlptZgZndEWN/pplNC/vnmllu1L47Q/lyM7uosphm1j3EWBFiZoTy682s0MzeD1/fCOWnmtkcM1tiZovM7KqjfztEJBnMjIsGHMfEq07h
U7uP3vi3DXPcfqq0oTi5mlAg8AFwP9gTFm1r9ctXHAdnfvBUwEJoS2/YHRwABgODDJzFIriTkBmOjuvYHtIXaZae5+avh6OJTtA65z97I+fmtmrav0LohInTD8xI788KITeOb9DUyavTLZw5GjlMiM5QygwN1XuXsRMBUYWa7OSOCRsP0UcL6ZWSif6u4H3X01UBDixYwZ2gwLMQgxL6tocO7+kbuvCNsbgM1ATgLHJSJ10I1DenLZqcdz30vLefGDjckejhyFRBJLJ2Bt1Ot1oSxmHXcvBnYC2RW0jVeeDewIMWL1dUVY7nrKzLqUH6iZnQFkAPpVR6SeMjN+dcXJDOzamu9PW8gH63cme0hSRYkkFotRVn7xM16d6ioHeBbIdfeTgVf4bIYUGYBZR+Ax4Gvu
lrFs1svJnlm1l+YWFhjG5EpK5okp7K5GvzaNMsnW88kq+/c6lnEkks64Do2UFnYEO8OmaWBrQCtlXQNl75FqB1iHFEX+6+1d3L7sH9EDCorLGZtQSeB37i7u/EOgh3n+zuee6el5OjlTKRui6nRSYPjz2dXQcO8c1H83WlWD2SSGKZD/QOV2tlEDkZP6NcnRnA2LA9CpjpkUs6ZgCjw1Vj3YHewLx4MUObWSEGIeYzcHhGUmYEsDSUZwD/AB519ycTP3QRqev6H9+SiVedyqL1O/nBkwt1pVg9UWliCec7bgZeIvLDfLq7LzGzu81sRKg2Bcg2swLgVuCO0HYJMB34EHgRuMndS+LFDLFuB24NsbJDbIBbwiXFC4FbgOtD+X8C5wHXR12KfOpRvh8iUsdcNOA4fnRRX55btJHfvVqQ7OFIAqwx/gaQl5fn+fn5yR6GiCTI3bntyYU8/e567v/qQC49+fhkD6lRMrMF7p5XWT395b2I1Hlmxv/7yknkdWvDbdMXsmjdjmQPSSqgxCIi9UJmWioPXjuIds0z+eaj+Xy6U1eK1VVKLCJSb7RrnsmU6/PYc6CYbz6az/4iXSlWFymxiEi90ve4lvxuzEA+2LCT2558n9LSxneeuK5TYhGReuf8fh34r4v78cLiT/ntqyuSPRwpJ63yKiIidc83zu3OR5t287tXV9AzJ4uRp5a/05Qki2YsIlIvmRk/v/xEzshtyw+fWsR7n2xP9pAkUGIRkXorMy2VP1xzGh1aZjL+sQVs2LE/2UMSlFhEpJ7Lbp7JlLGns7+ohG88ks++ouLKG0mNUmIRkXqvT4cW/P6rA1n26S6+P01XiiWbEouINAhDT2jPj/+jPy8t2cT/vvxRsofTqOmqMBFpML7+xVwKNu/m/lkF9GrfnMsG6kqxZNCMRUQaDDPjZyNO5KwebfnR3xex4GNdKZYMSiwi0qBkpKXwh6sH0bFVE771WD7rtu9L9pAaHSUWEWlw2mRlMGVsHgcPlfKNR/LZe1BXitUmJRYRaZB6tW/B/VefxkebdvPdqbpSrDYpsYhIgzW4Tw53XdqfV5Zu4t6Xlid7OI2GrgoTkQZt7BdyWbF5Dw++tpJe7ZszalDnZA+pwdOMRUQaNDPjpyMG8IWe2fzX04vJX7Mt2UNq8BJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmhPLrzazQzN4PX9+IajM21F9hZmOP7q0QkYYqPTWFSVefRqc2TfnWYwtYu01XitWkShOLmaUCDwAXA/2BMWbWv1y1ccB2d+8FTAQmhLb9gdHAAGA4MMnMUiuJOQGY6O69ge0hdplp7n5q+Ho49NEW+B/gTOAM4H/MrE0V3wcRaeBaN8vg4bF5HCqJXCm2R1eK1ZhEZixnAAXuvsrdi4CpwMhydUYCj4Ttp4DzzcxC+VR3P+juq4GCEC9mzNBmWIhBiHlZJeO7CHjZ3be5+3bgZSJJTETkCD1zmjPp6kEUFO7hu0+8R4muFKsRiSSWTsDaqNfrQlnMOu5eDOwEsitoG688G9gRYsTq6wozW2RmT5lZlyqMT0QEgHN6t+OnX+7Pq8s2M+HFZckeToOUSGKxGGXl03y8OtVVDvAskOvuJwOv8NkMKZHxYWbjzSzfzPILCwtjNBGRxuLas3O57uxuTH59FdPnr628gVRJIollHdAl6nVnYEO8OmaWBrQCtlXQNl75FqB1iHFEX+6+1d0PhvKHgEFVGB/uPtnd89w9Lycnp5JDFpGG7q5L+3NOr3b8+J+Lmbtqa7KH06AkkljmA73D1VoZRE7GzyhXZwZQdjXWKGCmu3soHx2uGusO9AbmxYsZ2swKMQgxnwEws45R/Y0Alobtl4ALzaxNOGl/YSgTEYkrLTWFB756Gl3aNuOGvy7gk626Uqy6VJpYwvmOm4n8sF4KTHf3JWZ2t5mNCNWmANlmVgDcCtwR2i4BpgMfAi8CN7l7SbyYIdbtwK0hVnaIDXCLmS0xs4XALcD1oY9twD1EktV84O5QJiJSoVbN0pky9nRKHcY9Mp9dBw4le0gNgkUmCY1LXl6e5+fnJ3sYIlJHvL1yC9dNmcc5vdsxZezppKbEOnUrZ
A3fMqq6e/vBeRRu8LPdvxs5EDmL28kF++sLTyBlIh3StMRAS4+sxuFGzew5Q3V9OrfXPGnNE12UOqtzRjEREJfnxJPwb3yeG
kBc1bqSrGjpcQiIhKkpabw+68OJLddFt/+2wLWbNmb7CHVS0osIiJRWjZJZ8rYyPnpcY/MZ+d+XSlWVUosIiLldMvO4sFrBvHx1n3c/Pi7FJeUJntI9YoSi4hIDGf1yOYXl5/IGyu28PPndaVYVeiqMBGROK46vSsrNu3h4TdX07N9c649q1uyh1QvaMYiIlKBOy/px9ATcvjpjCW8uWJLsodTLyixiIhUIDXF+N2YgfTMyeLGvy1gVeGeZA+pzlNiERGpRIsmkXuKpaWm8I1H8tm5T1eKVUSJRUQkAV3aNuOP1w5i7fZ93Pj4Ag7pSrG4lFhERBJ0em5bfnn5SbxVsJW7n/0w2cOps3RVmIhIFVyZ14WCzXv44+ur6N2hOdednZvsIdU5SiwiIlX0o+F9WVm4h589+yG52Vmc10dPpY2mpTARkSpKTTF+O3ogvds356bH36Vgs64Ui6bEIiJyFJpnpvHw2Dwy01IY98h8tu8tSvaQ6gwlFhGRo9S5TeRKsY07DvDtvy2gqFhXioESi4jIMRnUrS2/uuIk3lm1jf+ZsYTG+Lj38hJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmlOtrlJm5meWF1+lm9oiZLTazpWZ2Z9XfBhGRo/eV0zpz45CePDHvE/7y9ppkDyfpKk0sZpYKPABcDPQHxphZ/3LVxgHb3b0XMBGYENr2B0YDA4DhwCQzS60k5gRgorv3BraH2GVjaQHcAsyN6vtKINPdTwIGAd+KTmwiIrXhBxeewIX9O3DPcx8ye/nmZA8nqRKZsZwBFLj7KncvAqYCI8vVGQk8ErafAs43MwvlU939oLuvBgpCvJgxQ5thIQYh5mVR/dwD3AsciCpzIMvM0oCmQBGwK4HjEhGpNikpxsSrTuWE41ryncffY8Wm3ckeUtIkklg6AWujXq8LZTHruHsxsBPIrqBtvPJsYEeIcURfZjYQ6OLuz5Xr+ylgL7AR+AT4tbtvS+C4RESqVVbZlWLpqYx7JJ9tjfRKsUQSi8UoK392Kl6daik3sxQiS2y3xdh/BlACHA90B24zsx7lK5nZeDPLN7P8wsLCGGFERI5dp9ZNmXzdID7ddYA
to4rxRLJLGsA7pEve4MbIhXJyxJtQK2VdA2XvkWoHWIEV3eAjgRmG1ma4CzgBnhBP5XgRfd/ZC7bwbeAvLKH4S7T3b3PHfPy8nRX8mKSM05rWsb7ht1MvNWb+Mn/1zc6K4USySxzAd6h6u1MoicjJ9Rrs4MYGzYHgXM9Mg7OQMYHa4a6w70BubFixnazAoxCDGfcfed7t7O3XPdPRd4Bxjh7vlElr+GWUQWkaSz7CjeCxGRajPy1E58Z1gvpuevY8qbq5M9nFpVaWIJ5ztuBl4ClgLT3X2Jmd1tZiNCtSlAtpkVALcCd4S2S4DpwIfAi8BN7l4SL2aIdTtwa4iVHWJX5AGgOfABkYT1Z3dflNDRi4jUoO9f0IeLTzyOX76wlJWN6AFh1timaAB5eXmen5+f7GGISCNQuPsg50yYyYhTjue+K09J9nCOiZktcPfPnWooT395LyJSg3JaZDL69C784731rN+xP9nDqRVKLCIiNWz84J4ATH5tZZJHUjuUWEREalin1k25fGAnps5fS+Hug8keTo1TYhERqQXfHtKTopJS/vRWw79CTIlFRKQW9MhpziUndeSxOR+zc9+hZA+nRimxiIjUkpuG9GLPwWIenbMm2UOpUUosIiK1pP/xLRnWtz1/ems1ew8WV96gnlJiERGpRTcN7cn2fYd4Yt4nyR5KjVFiERGpRYO6teWsHm156I1VHCwuSfZwaoQSi4hILbtpaC827TrI3xesT/ZQaoQSi4hILTunVztO6dyKB19bSXFJw7utvhKLiEgtMzNuHNqLT7bt47lFG5M9nGqnxCIikgRf6teBPh2aM2l2AaWlDetmwEosIiJJkJJi3DikFx9t2sPLSzdH5ZhwAAAUOElEQVQlezjVSolFRCRJLj25I13bNmPSrIIG9ZRJJRYRkSRJS03hhsE9WbhuJ28VbE32cKqNEouISBJdMagTHVpmcv+sFckeSrVRYhERSaLMtFS+eW4P3lm1jQUfb0v2cKqFEouISJJ99cyutGmWzgOzGsaDwBJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmlOtrlJm5meVFlZ1sZnPMbImZLTazJlV7G0REkqdZRhpf/2J3Zi7bzJINO5M9nGNWaWIxs1TgAeBioD8wxsz6l6s2Dtju7r2AicCE0LY/MBoYAAwHJplZaiUxJwAT3b03sD3ELhtLC+AWYG5UWRrwV+AGdx8ADAEa9sMORKTBue7sXJpnpjFpdv2ftSQyYzkDKHD3Ve5eBEwFRparMxJ4JGw/BZxvZhbKp7r7QXdfDRSEeDFjhjbDQgxCzMui+rkHuBc4EFV2IbDI3RcCuPtWd2+Yd3YTkQarVbN0rjmrGy8s3siqwj3JHs4xSSSxdALWRr1eF8pi1nH3YmAnkF1B23jl2cCOEOOIvsxsINDF3Z8r13cfwM3sJTN718x+lMAxiYjUOePO6U5GagoPvla/Zy2JJBaLUVb+L3ni1amWcjNLIbLEdluM/WnAOcDV4d/Lzez88pXMbLyZ5ZtZfmFhYYwwIiLJldMik9Gnd+Hpd9ezfsf+ZA/nqCWSWNYBXaJedwY2xKsTznm0ArZV0DZe+RagdYgRXd4COBGYbWZrgLOAGeEE/jrgNXff4u77gBeA08ofhLtPdvc8d8/LyclJ4LBFRGrf+ME9AXjo9VVJHsnRSySxzAd6h6u1MoicjJ9Rrs4MYGzYHgXM9Mj9CWYAo8NVY92B3sC8eDFDm1khBiHmM+6+093buXuuu+cC7wAj3D0feAk42cyahYQ0GPjwKN4LEZGk69S6KZcP7MQT8z6hcPfBZA/nqFSaWML5jpuJ/ABfCkx39yVmdreZjQjVpgDZZlYA3ArcEdouAaYT+UH/InCTu5fEixli3Q7cGmJlh9gVjW878L9EktX7wLvu/nyib4CISF3z7SE9KSop5U9vrU72UI6KNaQbnyUqLy/P8/Pzkz0MEZG4bnr8XV5bXshbdwyjVdP0ZA8HADNb4O55ldXTX96LiNRBNw7pyZ6DxTz69ppkD6XKlFhEROqgAce3Yljf9vzprdXsKyquvEEdosQiIlJH3TS0J9v3HeLxuZ8keyhVosQiIlJHDerWlrN6tOWhN1ZxsLj+3FBEiUVEpA67aWgvNu06yNPvrk/2UBKmxCIiUoed06sdp3RuxR9mr6S4pDTZw0mIEouISB1mZtw4tBefbNvH84s3Jns4CVFiERGp477UrwN9OjTngVkFlJbW
89VGIREanjUlKMG4f04qNNe3hl6aZkD6dSSiwiIvXApSd3pGvbZjwwq4C6fscUJRYRkXogLTWFGwb3ZOG6nbxVsDXZw6mQEouISD1xxaBOdGiZyQOzCpI9lAopsYiI1BOZaal889wezFm1lQUfb0/2cOJSYhERqUfGnNGVNs3SmVSHZy1KLCIi9UhWZhpf+2J3Xl22mQ837Er2cGJSYhERqWfGnp1L88w0Js2um7MWJRYRkXqmVbN0rjmrG88v3siqwj3JHs7nKLGIiNRD487pTkZqCg++tjLZQ/kcJRYRkXoop0Umo0/vwtPvrmf9jv3JHs4REkosZjbczJabWYGZ3RFjf6aZTQv755pZbtS+O0P5cjO7qLKYZtY9xFgRYmaU62uUmbmZ5ZUr72pme8zsB4kfvohI/TV+cE8AHnp9VZJHcqRKE4uZpQIPABcD/YExZta/XLVxwHZ37wVMBCaEtv2B0cAAYDgwycxSK4k5AZjo7r2B7SF22VhaALcAc2MMdSLwr0QOWkSkIejUuimXD+zEE/M+Ycueg8kezmGJzFjOAArcfZW7FwFTgZHl6owEHgnbTwHnm5mF8qnuftDdVwMFIV7MmKHNsBCDEPOyqH7uAe4FDkR3bmaXAauAJQkcj4hIg3HDkJ4UlZTypzdXJ3sohyWSWDoBa6NerwtlMeu4ezGwE8iuoG288mxgR4hxRF9mNhDo4u7PRXdsZlnA7cDPEjgWEZEGpWdOcy45qSOPzfmYnfsPJXs4QGKJxWKUlb+1Zrw61VJuZilElrpui7H/Z0SWziq85s7MxptZvpnlFxYWVlRVRKReuXFIT3YfLOaxOWuSPRQgscSyDugS9bozsCFeHTNLA1oB2ypoG698C9A6xIgubwGcCMw2szXAWcCMcAL/TODeUP494L/M7ObyB+Huk909z93zcnJyEjhsEZH6YcDxrRjWtz1T3lzNvqLiyhvUsEQSy3ygd7haK4PIyfgZ5erMAMaG7VHATI88MGAGMDpcNdYd6A3MixcztJkVYhBiPuPuO929nbvnunsu8A4wwt3z3f3cqPLfAr909/uP5s0QEamvbhrak+37DvHEvLWVV65hlSaWcL7jZuAlYCkw3d2XmNndZjYiVJsCZJtZAXArcEdouwSYDnwIvAjc5O4l8WKGWLcDt4ZY2SG2iIhUYFC3tpzZvS2TX1/JweKSpI7F6vqTyGpCXl6e5+fnJ3sYIiLV6o0VhVw7ZR7/7ysnMeaMrtUe38wWuHteZfX0l/ciIg3EOb3acXLnVjz42kqKS0qTNg4lFhGRBsLMuGloLz7euo/nF29M2jiUWEREGpAv9etAnw7NmTRrJaWlyTnVocQiItKApKQYNw7pxfJNu3ll6abkjCEpvYqISI259OSOdGnblAdmryQZF2gpsYiINDBpqSncMLgnC9fu4O2VW2u9fyUWEZEGaNSgzrRvkcn9M2v/8cVKLCIiDVBmWirjz+vBnFVbWfDx9lrtW4lFRKSBGnNGV9o0S2fSrNqdtSixiIg0UFmZaXzti915ddlmPtywq9b6VWIREWnAxp6dS/PMNP7w2spa61OJRUSkAWvVLJ1rzurG84s2sHrL3lrpU4lFRKSBG3dOd9JTU3hwdu3MWpRYREQauJwWmYw+vQtPv7eODTv213h/SiwiIo3A+ME9cYfJr6+q8b6UWEREGoFOrZty+cBObNixv8Zv85JWeRUREWkIfvmVk0hPrfn5hGYsIiKNRG0kFVBiERGRapZQYjGz4Wa23MwKzOyOGPszzWxa2D/XzHKj9t0Zypeb2UWVxTSz7iHGihAzo1xfo8zMzSwvvP6SmS0ws8Xh32FVfxtERKS6VJpYzCwVeAC4GOgPjDGz/uWqjQO2u3svYCIwIbTtD4wGBgDDgUlmllpJzAnARHfvDWwPscvG0gK4BZgb1fcW4MvufhIwFngs8cMXEZHqlsiM5QygwN1XuXsRMBUYWa7OSOCRsP0UcL6ZWSif6u4H3X01UBDixYwZ2gwLMQgxL4vq5x7gXuBAWYG7v+fuG8LLJUATM8tM4LhERKQGJJJYOgFro16vC2Ux67h7MbATyK6g
zybGBHiHFEX2Y2EOji7s9VMNYrgPfc/WACxyUiIjUgkcuNLUZZ+Yug49WJVx4rocWtb2YpRJbYro87SLMBRJbRLoyzfzwwHqBr167xwoiIyDFKZMayDugS9bozsCFeHTNLA1oB2ypoG698C9A6xIgubwGcCMw2szXAWcCMqBP4nYF/ANe5e8yb4bj7ZHfPc/e8nJycBA5bRESORiIzlvlAbzPrDqwncjL+q+XqzCBy4nwOMAqY6e5uZjOAx83sf4Hjgd7APCIzk8/FDG1mhRhTQ8xn3H0n0K6sMzObDfzA3fPNrDXwPHCnu7+VyEEvWLBgi5l9nEjdOFoRWe5LhprsuzpiH22MqrarSv1E6lZWpx2RX3waGn2Wqz9GQ/4sd0uolrtX+gVcAnwErAR+HMruBkaE7SbAk0ROzs8DekS1/XFotxy4uKKYobxHiFEQYmbGGM9sIC9s/wTYC7wf9dU+keM62i9gck3GT1bf1RH7aGNUtV1V6idSt7I6QH6yvuc1+aXPcvXH0GfZE7uli7u/ALxQruyuqO0DwJVx2v4C+EUiMUP5KiJXjVU0niFR2z8Hfl7hAVS/Z2u5v9rquzpiH22MqrarSv1E6ibze5pM+ixXf4xG/1m2kMFEpAJmlu/ueckeh8ixqo3Psm7pIpKYyckegEg1qfHPsmYsIiJSrTRjERGRaqXEIiIi1UqJRUREqpUSi8hRMLMeZjbFzJ6qvLZI3WVml5nZQ2b2jJnFvCVWVSmxiARm9icz22xmH5Q
9yzgzxyZ+5xsSOJJFcVP8v/dPdvErkX41XV0b8Si8hn/kLkuUGHJfg8IpG65i9U
P8k7D/mCmxiATu/jqRm6dGS+R5RCJ1SlU+yxYxAfiXu79bHf0rsYhULOazg8ws28weBAaa2Z3JGZpIlcR7DtZ3gAuAUWZ2Q3V0lNC9wkQasZjPCHL3rUC1/CcUqSXxPsu/A35XnR1pxiJSsUSeRyRSH9TaZ1mJRaRih59HZGYZRJ4dNCPJYxI5GrX2WVZiEQnM7AkiD6s7wczWmdk4dy8GbgZeApYC0919STLHKVKZZH+WdRNKERGpVpqxiIhItVJiERGRaqXEIiIi1UqJRUREqpUSi4iIVCslFhERqVZKLAKAmbmZPRb1Os3MCs3suUranWpml1SwP8/Mjul2EWaWY2Zzzew9Mzv3WGJVNzO728wuSPY4KmJmfzGzUbXQz5VmttTMZpUrP77suTWVfV6Oos/WZnZjrL4keZRYpMxe4EQzaxpefwlYn0C7U4GYPyjMLM3d8939lmMc2/nAMncf6O5vJNIg3CK8WphZ3Hvquftd7v5KdfVV11TxfRwH3OjuQ6ML3X2Du5cltriflwrGUNE9DVsDhxNLub4kSZRYJNq/gP8I22OAJ8p2mFlWeHjQ/DBzGBluC3E3cJWZvW9mV5nZT81sspn9G3jUzIaUzXrMrLmZ/dnMFpvZIjO7wsxSw2/UH4Ty70cPyMxOBe4FLgl9NDWzMaHuB+F232V194QZxFzg7KjyfmY2L+p1rpktCtt3hWP6IIzbQvlsM/ulmb0G/NjMVptZetjX0szWmFl69GwglP3MzN4N4+sbynPM7OVQ/kcz+9jM2pV/88P4f2FmC83sHTPrEMqPmHGY2Z7w7xAze83MppvZR2b2KzO72szmhf57RoW/wMzeCPUuDe1Tzey+cPyLzOxbUXFnmdnjwOIY4/zc+29mdwHnAA+a2X3l6ueGurE+L5/7XIU215vZk2b2LPDv8Nl5Neq9LXt0wa+AniHefWV9hRhNoj5v75nZ0KjYT5vZi2a2wszujXo/4n4WpQrcXV/6AtgDnAw8BTQB3geGAM+F
8Ergn
YGPgCwiT527PyrOT4EFQNPwOjrGBOC3UXXbAIOAl6PKWscY2+E+gOOBT4AcInfnnglcFvY58J9xju99oEfYvh34SdhuG1XnMeDLYXs2MClq35+j+hkP/CZs/wUYF
XAN8J2zcCD4ft+4E7w
wMM52McboUf3fGzXGw32Ufa+i3tsdQEcgk8gM82dh33fL3uvQ/kUiv0j2JnIzwibhOMr6yATyge4h7l6ge4wxVvT+zwbyYrTJBT4o/71M4HO1ruz7E/pqG
AQVE7tZ7OHaMvm4D/hy2+4ZxNwmxVwGtwuuPidycsdLPor4S+9KMRQ5z90VE/mOOAV4ot/tC4A4ze5/ID5AmQNc4oWa4+/4Y5RcQ9YQ6d99O5D94DzP7vZkNB3ZVMszTgdnuXuiRex/9DTgv7CsB/h6n3XTgP8P2VcC0sD3UIudvFgPDgAFRbaZFbT8MfC1sf41Ioonl6fDvAiLvJUR+k58K4O4vAtvjtC0Cys5pRbevyHx33+juB4GVwL9D+eJy7ae7e6m7ryDynvcl8j29LnxP5wLZRBIPwDx3Xx2jv4re/6NR0efqZXcve1iVAb8MM81XiDxHpEMlsc8h8ssC7r6MSALpE/a96u473f0A8CHQjap/FiUOPY9FypsB/JrIb63ZUeUGXOHuy6Mrm9mZMWLsjRPbiPxWfpi7bzezU4CLgJuI/PD/egXji/VMiTIH3L0kzr5pwJNm9nSkW19hZk2ASUR+y15rZj8l8oPtc8fh7m+FZZbBQKq7H/Es8SgHw78lfP
q6IxRzvk4Vflcu2LCcvWYakuI0Z/AKVRr0s58v93+ZsCehjXd9z9pegdZjaEir+H1amiz1X0GK4mMksa5O6HzGwNR36v4sWOJ/p9KwHSjuKzKHFoxiLl/Qm4293Lr62/BHwn6hzEwFC+G2iRYOx/E7m7KiFGm3CuIcXd/w78N3BaJTHmAoPNrJ1FTiyPAV6
GN3X0nkB8h/89lMpOwH0xYzaw5UdtL3USLnneLNVuJ5kzBbMrMLiSwBVsUaIss0EHkscnoV2wNcaWYp4bxLD2A5ke/pt6POHfUxs6xK4hzV+x+l/Ocl3ueqvFbA5pBUhhKZYcSKF+11IgkJM+tDZCa0PE5djuKzKHEoscgR3H2du/9fjF33EPmBtiicHL0nlM8C+pedjK0k/M+BNuHk6EJgKJEljdlhKeQvQIWP+XX3jaHOLGAh8K67P5PY0TENuIbIshjuvgN4iMiy0T+JPK+iIn8jkhSeqKReeT8DLjSzd4GLgY1EfiAm6iEiP8znAeV/k0/UciIJ4F/ADWEJ6GEiy0Dvhu/pH6lkFeMY33/4/Ocl3ueqvL8BeWaWTyRZLAvj2Qq8FT5T95VrMwlIDcuc04Drw5JhPFX6LEp8um2+SIIscmXWSHe/tortMoESdy82s7OBP7j7qTUySJE6QOdYRBJgZr8nMts4mj/u6wpMN7MUIifov1mdYxOpazRjERGRaqVzLCIiUq2UWEREpFopsYiISLVSYhERkWqlxCIiItVKiUVERKrV/weuMWy8DWomwwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(traindata, {},'leastAbsoluteE
or', param,3, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.24489669490739654, 0.26140602081099523, 0.2619618739499482, 0.25816082247564837, 0.25905551178812486, 0.25776353461608653, 0.25866605672527904]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEKCAYAAAAMzhLIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt8XWWd7/HPN0mTtE3btCTl0qQ3KEKVQiXDQZCRQVBATmGOjsDIjCgcPCpn5sjRIx6UmUGdEdBx9IgOqHgbFQVvFYuAUBRHYFooFGgFS0EaykAKtPRCdnP5nT/Ws9OV3Z1kt03a0nzfr9d+da9nPWutZ+2dP
9XNZaigjMzMwGUrWnC2BmZns3BwozMxuUA4WZmQ3KgcLMzAblQGFmZoNyoDAzs0E5UJiZ2aAcKMzMbFAOFGZmNqiaSjJJOhX4AlANfC0iPlOy/hLgQqAb6ADeGxF/TOumA18DWoEATo+IpyR9F2gDuoD/AN4XEV2STgR+BjyZdv/jiLhisPI1NTXFzJkzKzkVMzNL7
nUR0TxUviEDhaRq4BrgFKAdWCJpYUSsyGVbBrRFxBZJ7weuAs5O674NfDoibpfUAPSm9O8C56X33yMLNF9Jy3dHxBlDla1o5syZLF26tNLsZmYGSPpjJfkq6Xo6BlgVEasjYitwA3BmPkNELI6ILWnxXqAlFWIuUBMRt6d8m4r5ImJRJGQtipZKCmxmZrtXJYFiGrAmt9ye0gZyAXBLen8osF7SjyUtk3R1aqH0kTQG+Cvgl7nkN0h6SNItkl5bQRnNzGyEVDJGoTJpZW85K+k8snGHN+X2fwIwH3ga+AFwPvD13GZfBn4TEXen5QeAGRGxSdLpwE+BOWWOdRFwEcD06dMrOA0zM9sZlbQo2skGootagLWlmSSdDFwGLIiIQm7bZanbqpus0n99bpu/A5qBS4ppEfFyRGxK7xcBYyQ1lR4vIq6LiLaIaGtuHnIsxszMdlIlgWIJMEfSLEm1wDnAwnwGSfOBa8mCxPMl206WVKzJTwJWpG0uBN4KnBsRvbl9HSBJ6f0xqYwv7MzJmZnZrhuy6ykiuiVdDNxKNj32+oh4VNIVwNKIWAhcDTQAN6Y6/umIWBARPZI+DNyRKv/7ga+mXf8r8EfgnrRNcRrsO4D3S+oGXgHOCT9dycxsj9G+UAe3tbWFp8eame0YSfdHRNtQ+Xxl9jCICH614jn+fdW6PV0UM7NhV9GV2VZeRPDrxzv43G2P8/AzGxg7pprFHz6RAybV7+mimZkNG7codtK9q1/gndfew/nfWMJLW7by8bcdTk9v8NnbHtvTRTMzG1ZuUeygB9es53O3Pcbdf1jH/hPr+ORZr+PstlZqa6ro2FjgurtXc/5xM3ndtEl7uqhmZsPCgaJCK599mc/d9ji/WvkcU8bX8vG3Hc55x86gfsy2C80/eNIh3Hh/O5+8eQU3XHQsaTaXmdmrmgPFEJ7o2MTnb3+cm5c/y4T6Gj78lkM5
hZNNRt/9FNrB/Dh045lE/89BFuffQ5Tn3dAXugxGZmw8uBYgBrXtzCF+/4Az96oJ36MdVc/GeH8N9PmM2kcWMG3e7cP2nl2797in+6ZSUnHTaV2hoPA5nZq5sDRRkvd3bxti/eTWd3L+85fhbvP/FgmhrqKtq2prqKy952OOd/YwnfvucpLjxh9sgW1sxshPnnbhkdGwu83NnNP/75EXzijLkVB4miE18zlT89tJkv3PEHXty8dYRKaWa2ezhQlFHoym49VW4colIff9vhbC5084VfPT5cxTIz2yMcKMoodPcAUDdm5z+eQ/efwLnHTOff7nuaVc9vGq6imZntdg4UZXSmFkXdLg5Ef+iUQxk3ppp/XLRyOIplZrZHOFCUUWxR5K+R2BlNDXV88KRDuPP3z3P3HzqGo2hmZrudA0UZhe7haVEAnH/cTFqnjOXTv1hJT++
069Zjb6OFCU0dmVxihqdq1FAVmr5NJTD+f3/7mRHy5dM/QGZmZ7GQeKMoazRQFw+hEH0DZjMp+77TE2FbqHZZ9mZruLA0UZxUCxq2MURZL4xBlzWbdpK19evGpY9mlmtrs4UJRR6Nr16bGljmxt5M/nT+Nrv32S9pe2DNt+zcxGmgNFGcPd9VT0kbe+hirBl
0MyvM7NXDgaKMQlcPEtRWD+/Hc1DjWC46YTY/f2gt9
xpWHdt5nZSKmoJpR0qqTHJK2SdGmZ9ZdIWiFpuaQ7JM3IrZsu6TZJK1OemSl9lqT7JP1B0g8k1ab0urS8Kq2fORwnuiMK3b3U1VSNyPMk3vemg5k6oY5P/WIFEZ4ua2Z7vyEDhaRq4BrgNGAucK6kuSXZlgFtETEPuAm4Krfu28DVEXE4cAzwfEq/Evh8RMwBXgIuSOkXAC9FxCHA51O+3aqzq2dYpsaWM76uhg+/9TUse3o9P1/+7Igcw8xsOFXSojgGWBURqyNiK3ADcGY+Q0QsjojiCO29QAtACig1EXF7yrcpIrYo+6l+EllQAfgWcFZ6f2ZaJq1/s3bzo+IK3b3UD+NAdqm3v76FuQdO5Mpbft93zYaZ2d6qktpwGpC/Uqw9pQ3kAuCW9P5QYL2kH0taJunq1ELZD1gfEcWLCvL77DteWr8h5d9tsq6nkWlRAFRXiY+fcTjPrH+F
2yRE7jpnZcKgkUJT7NV+2c13SeUAbcHVKqgFOAD4M/AkwGzh/iH1WdDxJF0laKmlpR8fw3kcp63oa2XH+4w5u4pS5+/Plxavo2FgY0WOZme2KSmrDdqA1t9wCrC3NJOlk4DJgQUQUctsuS91W3cBPgdcD64BGSTVl9tl3vLR+EvBi6fEi4rqIaIuItubm5gpOo3JZ19PItSiKPnbaYRS6e/nn2z1d1sz2XpUEiiXAnDRLqRY4B1iYzyBpPnAtWZB4vmTbyZKKNflJwIrIpvssBt6R0t8N/Cy9X5iWSevvjN08PajQPfItCoDZzQ389Rtm8oMla1j57Msjfjwzs50xZG2YWgIXA7cCK4EfRsSjkq6QtCBluxpoAG6U9KCkhWnbHrJupzskPUzWrfTVtM1HgUskrSIbg/h6Sv86sF9KvwTYbjruSOvs6h3Wq7IH8zdvPoQJ9WP49C9Werqsme2VKnrWZ0QsAhaVpF2ee3/yINveDswrk76abEZVaXon8BeVlGukFLp7aBw7Zrccq3FcLf
5Dn8w89XsPix5znpsP13y3HNzCrlK7PLKOzGFgXAecfOYHbTeD71i5V09fTutuOamVXCgaKMzu6Ru+CunDHVVfzf0w9ndcdmvnff07vtuGZmlXCgKKPQ1btbBrPz3nz4VI47eD8+/6vH2bCla7ce28xsMA4UZeyu6bF5kvj42+ay4ZUu/t+df9itxzYzG4wDRRm744K7cuYeNJF3Ht3Kt+55iqfWbd7txzczK6eiWU+jSUT03T12T/jfbz2Uny9fyz/dspJ
6ptj5RhuHV29bDy2ZdZ3r6Bh9as57HnNtLUUMfs5vHMbhrP7OYGZjWN54CJ9VRV7d
eplZBRwoSmxNs47qdnPXU9HUCfV84MSD+extj3Pv6hc4dvZuvc3VLuvpDVY9v4mH1qznofb1LG/fwO
82W6erJrRJoa6jj8wAms21RgyVMvsmXrtpsijh1Tzcym8bkAMp7ZTQ3Mah7PxPrdM13ZzLbnQFFipJ5utyMuPGE237vvaT71ixUs/OAb99pf2RFB+0uv8OCa9SxvX89D7Rt45JkNfZX/hLoajmiZxIUnzObIlknMa2nkwEn1fc/5iAiee7nA6o5NrF63mdUdm3ly3SYeeWYDtzz8LL256w+bGur6gsesXCtk+pRx1O7B78psNHCgKNHZ97zsPdOiAKgfU81HTzuMv73hQX687BnecXTLHitLXsfGQl9AeCgFh5fSDK3amirmHjiRd7a1Mq9lEke2NjJrv/GDBjlJHDCpngMm1XPcIU391m3t7uXpFzfzRMdmnly3mdUdm3hy3WZuX/EcL2ze2pevukq0Th7L7OYGZjeNZ1ZqhcxuHs/UCXUj8vAps9HGgaJEoStrUdTv4V+pC448iG/8+1NcfevvOf2IAxhXu3u/qo2dXTz8zIa+cYXl7Rt4Zv0rAFQJ5kydwClz92deSyNHtTZy6P4ThvWXfW1NFYdMncAhUydst27Dli5Wr9uUAkgWSJ7o2MTvnlhHZ9e2CxbH11b3BY5ZqTVycHMDM5vG01DnP317ders6mHdpgIdG7NX65RxHH7gxBE9pv+3lOjretqDLQrIfm1/4ozDeftX7uHaX6/mQ6ccOmLHKnT3sPLZjSxvX5+6kTbwRMcmireemj5lHPOnN3L+cTM5srWR1x40kfF7sKKdNG4M86dPZv70yf3Se3uDZ1/u7Gt9rO7YzOp1m3ng6Zf4+fK15G+ltf/Eur4urPx4SMvksdQM87PSzYbS0xu8uHlrVvnngsC25c6+5Zc7u/tt+74/ne1Asbv1dT3tBf3eR8+YwhnzDuTa3zzBOce0cuCksbu8z57e4ImOTX3jCsvbN7Dy2fxgcy1HtjTyX+cdxJGt2bjClPG1u3zc3aGqSkxrHMu0xrGcMKf
ec7u3r44wtbeHLdJp7o2DYesujhZ1mfu8BxTLWYPmUcs5oaOLhkPKSpodZdWVaxiODlzu6SCr80AGSvFzcX+o3JFTXU1dA8oY7mhjoOO2AibzykNlsuvhrqaZm86/XCUBwoShRbFLv7gruBfPTUw7htxXNcfetj/PM7j9qhbYuDzcXZRw+tWc8jz2xgcxpsbqir4Yhpk3jvG2dxVEsj81obOSg32LwvqR9TzWsOmMBrDti+K+ulzVtZvW5TXwvkyY7NrF63id883tE3Cw5gQn1N33Te/HjIrKbxjK3dO/5ebOR1dvXQsbHA8yUV
oygWBr9
3bhtTLZobsop+WmM9R7VO6lvOB4CmCbW7vct5IHtHKfYihe69p0UB0DplHBe8cRZfuesJzj9uJvNaGgfMu25TGmxes6Fv0PnFNPBbW13F4QdN5O1Ht3BkSyNHtk5idlPDXjujaneaPL6Wo8dP4egZU/ql9/QGa9e/whMd/cdD7lv9Aj9Z9ky/vAdNqi87HnJQ41iq/Rnv9bp7enlx89as8h/gl/+6tLyx0L3d9hLsN76WplThz24e39cSKFb+U1MAmDi25lX3Y8yBokRxMHtvCRQAHzjxYH64ZA2funklP3jfsUhiU6Gbh9uLASELDsXBZgnmTG3gzYdNZV5rI0e2TOKwAyZ6GukOqq4SrVPG0TplHCe+pv+6V7b28OS6bTOyVq/LWiM/ffAZNub6kGtrqpi537h+XVhZl1bDq6ZL79UqItjwSldlXT9btlLucTAT6rd1/cw9aCLNE+r6gkExfeqEOqaMr92nx7YcKEoUWxR7S9cTwIT6MVzylkO57CePcOG3lvL0i1tYlRtsbpk8lqOmN/Lu42Ywr6WR102b5Fk9I2xsbTVzD5rI3IP6DyJGBC9s3to3BlLszlr1/Cbu/P3zfWNBAI3jxmRdWGk6b7Fba8Z+4/aqv7+9zZat3QNW+P26fzYV+n3eRbU1VUxNFX7rlHG8fsbkMl0/2
+HjKuTUrsDRfclXN2Wys3Lm3nwTXrObK1kbfNO5AjWxqZ1zKJ/Rrq9nTxLJFEU0NWCR0zq39XVndPL+0vvbLdeMhvV3Xwowfac/uAaY1jU+tjW1fW7OYGDtxHb3PS1dPLC5uKs346BwwEHRsLfWNseVWC/Rq2VfBz9p+wXddP8TWh7tXX9bOnOVCU2BsuuCunprqKn37weCLCf+SvUjXVVcxsGs/MpvGcdFj/dZsK3TyVuq9Wd2zqGw+5cemafhVj/ZgqZu6Xu71JbmrvpHF7121OenuD9f26fgYOAC8NcGv9SWPH9FX481oat/vFX+wGmjK+1mNBI8iBosTe2qIocpDYNzXU1fC6aZN43bRJ/dIjgo6Nhe2uUF/57EZuffQ5enJzKvcbX7vdLU4Obh5P65Rxw/Ygrohg89ae/hX+xk46NhVYt3Hrdt1A3WXmfNaPqWLqhHqaJ2TXshwzawrNDfX
fJvaqjdrQ8Qs4E5UJTouzJ7L2tR2OgkiakT65k6sZ43HNz/BpFbu3tZ89KW7cZD7vx9Bz9cuq0rq0rZ7LlZTdtusnhwCib7T8xuc1Lo7mHdpq19M3sGG/h9pWv7rp/qKtHUUNv3a
wA/NdP/2DwPjaav/geZWpKFBIOhX4AlANfC0iPlOy/hLgQqAb6ADeGxF/TOt6gIdT1qcjYkFKvxsoTmqfCvxHRJwl6UTgZ8CTad2PI+KKnTu9Hbc3XXBnNpjamioObm7g4OYGYP9+617u7OLJXCvkiTQect/qF/tV9ONqqxlTXcWGV8p3/UweN6avgn/99MaSAd9srn9zQx2Tx9Xuk2MnlhkyUEiqBq4BTgHagSWSFkbEily2ZUBbRGyR9H7gKuDstO6ViNjuSrGIOCF3jB+RBYeiuyPijB0+m2FQ6O6lSlDjP3p7FZtYP4YjWxs5srX/dTe9vcFzGzv7Wh+rOzbR0xtlB333G1/nKdUGVNaiOAZYFRGrASTdAJwJ9AWKiFicy38vcF6lBZA0ATgJeE+l24ykQncP9WPcNLZ9U1WVOHDSWA6cNJbjS+7YazaQSn4uTAPW5J
U9pALgBuyS3XS1oq6V5JZ5XJ/+fAHRHxci7tDZIeknSLpNdWUMZhsyefbmdmtjeqpEVR7qd1mWsYQdJ5QBvwplzy9IhYK2k2cKekhyPiidz6c4Gv5ZYfAGZExCZJpwM/BeaUOdZFwEUA06dPr+A0KpM9L9sD2WZmRZX8dG4HWnPLLcDa0kySTgYuAxZERKGYHhFr07+rgbuA+blt9iPr2vpFLv/LEbEpvV8EjJG0XRs5Iq6LiLaIaGtubi5dvdMK3b3Uj3GLwsysqJIacQkwR9IsSbXAOcDCfAZJ84FryYLE87n0yZLq0vsm4HhyYxvAXwA3R0RnbpsDlAYIJB2TyvjCzpzczih09bpFYWaWM2TXU0R0S7oYuJVseuz1EfGopCuApRGxELgaaABuTHV8cRrs4cC1knrJKvzPlMyWOgfoN9UWeAfwfkndwCvAORHlbtc1Mjq7e6hzi8LMrE9F11GkLqBFJWmX596fPMB2vwOOGGS/J5ZJ+xLwpUrKNRIKXb3Uu0VhZtbHP51LFNyiMDPrxzViic4uT481M8tzjVii0O3psWZmeQ4UJQrdve56MjPLcY1YotPTY83M+nGgKJF1PfljMTMrco1YIrsy2y0KM7MiB4qc3t5gq28KaGbWj2vEnK096TGoHsw2M+vjGjGn7zGoHsw2M+vjQJFT6E6PQXWLwsysj2vEnM7UovD0WDOzbRwocootCj+PwsxsG9eIOYVutyjMzEo5UOR0dqUxCk+PNTPr4xoxp9ii8AV3Zm
OFDk9M16covCzKyPa8ScvllPHsw2M+vjGjFnW4vCXU9mZkUOFDl9V2a7RWFm1qeiGlHSqZIek7RK0qVl1l8iaYWk5ZLukDQjt65H0oPptTCX/k1JT+bWHZXSJemL6VjLJb1+OE60EttmPblFYWZWVDNUBknVwDXAKUA7sETSwohYkcu2DGiLiC2S3g9cBZyd1r0SEUcNsPuPRMRNJWmnAXPS678AX0n/jrht11G4RWFmVlRJjXgMsCoiVkfEVuAG4Mx8hohYHBFb0uK9QMsulOlM4NuRuRdolHTgLuyvYg4UZm
q6RGnAasyS23p7SBXADckluul7RU0r2SzirJ++nUvfR5SXU7ebxh09nVQ02VqKl2oDAzK6qkRlSZtCibUToPaAOuziVPj4g24C+Bf5F0cE
GHAY8CfAFOCjO3I8SRelALS0o6OjgtMYWsEPLTIz204ltWI70Jp
gHWlmaSdDJwGbAgIgrF9IhYm/5dDdwFzE/Lz6bupQLwDbIuroqPFxHXRURbRLQ1NzdXcBpDK3T3+KpsM7MSlQSKJcAcSbMk1QLnAAvzGSTNB64lCxLP59InF7uUJDUBxwMr0vKB6V8BZwGPpM0WAn+dZj8dC2yIiGd34RwrVuhyi8LMrNSQs54iolvSxcCtQDVwfUQ8KukKYGlELCTramoAbszqfZ6OiAXA4cC1knrJgtJncrOlviupmayr6UHgf6T0RcDpwCpgC/Ce4TnVoXV291LnFoWZWT9DBgqAiFhEVoHn0y7PvT95gO1+BxwxwLqTBkgP4IOVlGu4Fbp63KIwMyvhWjGn4BaFmdl2HChyOt2iMDPbjmvFnEJ3r2c9mZmVcKDI8XUUZm
c62Y48FsM7PtuVbMyVoU7noyM8tzoMjJrsz2R2JmludaMaezyy0KM7NSDhQ5he4ePy
zKyEa8Wkpzfo6gnq3aIwM+vHgSIpdKfHoLpFYWbWj2vFpNDlp9uZmZXjWjEpPgbVV2abmfXnQJH0dT25RWFm1o9rxaSzr+vJLQozszwHiqTYovAFd2Zm
lWTIpjFG5RmJn150CRdHZ5eqyZWTmuFRNPjzUzK8+1YuLpsWZm5VUUKCSdKukxSaskXVpm/SWSVkhaLukOSTNy63okPZheC3Pp3037fETS9ZLGpPQTJW3IbXP5cJzoUPq6ntyiMDPrZ8haUVI1cA1wGjAXOFfS3JJsy4C2iJgH3ARclVv3SkQclV4LcunfBQ4DjgDGAhfm1t2d2+aKHT6rneDBbDOz8ir5+XwMsCoiVkfEVuAG4Mx8hohYHBFb0uK9QMtQO42IRZEA/1HJNiPJ02PNzMqrpFacBqzJLbentIFcANySW66XtFTSvZLOKs2cupz+CvhlLvkNkh6SdIuk11ZQxl3mC+7MzMqrqSCPyqRF2YzSeUAb8KZc8vSIWCtpNnCnpIcj4onc+i8Dv4mIu9PyA8CMiNgk6XTgp8CcMse6CLgIYPr06RWcxuB8Cw8zs/IqqRXbgdbccguwtjSTpJOBy4AFEVEopkfE2vTvauAuYH5um78DmoFLcvlfjohN6f0iYIykptLjRcR1EdEWEW3Nzc0VnMbgCt291FZXUVVVLi6amY1elQSKJcAcSbMk1QLnAAvzGSTNB64lCxLP59InS6pL75uA44EVaflC4K3AuRHRm9vmAElK749JZXxh50+xMp1dPW5NmJmVMWTXU0R0S7oYuBWoBq6PiEclXQEsjYiFwNVAA3BjquOfTjOcDgeuldRLVuF/JiJWpF3/K/BH4J60zY/TDKd3AO+X1A28ApyTBrxHVKG711dlm5mVUckYRbELaFFJ2uW59ycPsN3vyKa/lltX9tgR8SXgS5WUazgVuno9kG1mVoZ/QieF7h63KMzMynDNmHS6RWFmVpYDRVLo7vHFdmZmZbhmTArdvZ71ZGZWhmvGpNDV464nM7MyHCgStyjMzMpzzZgUunv9LAozszIcKBJfmW1mVp5rxsRXZpuZleeaMSl09VDvwWwzs+04UCSdblGYmZXlmhHo7umlpzc8PdbMrAwHCrY9L9tXZpuZbc81I9mMJ/BjUM3MynGgYFuLwtNjzcy255qRfNeTWxRmZqUcKMh3PfnjMDMr5ZqRXNeTB7PNzLbjmpHsYjvAF9yZmZXhQIFbFGZmg6moZpR0qqTHJK2SdGmZ9ZdIWiFpuaQ7JM3IreuR9GB6Lcylz5J0n6Q/SPqBpNqUXpeWV6X1M3f9NAfn6bFmZgMbMlBIqgauAU4D5gLnSppbkm0Z0BYR84CbgKty616JiKPSa0Eu/Urg8xExB3gJuCClXwC8FBGHAJ9P+UaUp8eamQ2skprxGGBVRKyOiK3ADcCZ+QwRsTgitqTFe4GWwXYoScBJZEEF4FvAWen9mWmZtP7NKf+I8fRYM7OBVRIopgFrcsvtKW0gFwC35J
JS2VdK+kYjDYD1gfEd1l9tl3vLR+Q8o/Yjw91sxsYDUV5Cn3az7KZpTOA9qAN+WSp0fEWkmzgTslPQy8PMg+KzqepIuAiwCmT58+cOkrsK3ryS0KM7NSlfyEbgdac8stwNrSTJJOBi4DFkREoZgeEWvTv6uBu4D5wDqgUVIxUOX32Xe8tH4S8GLp8SLiuohoi4i25ubmCk5jYIXu1KLwrCczs+1UUjMuAeakWUq1wDnAwnwGSfOBa8mCxPO59MmS6tL7JuB4YEVEBLAYeEfK+m7gZ+n9wrRMWn9nyj9iOrs8mG1mNpAhu54iolvSxcCtQDVwfUQ8KukKYGlELASuBhqAG9O489NphtPhwLWSesmC0mciYkXa9UeBGyR9imzW1NdT+teB70haRdaSOGeYznVAhe4eamuqGOExczOzV6VKxiiIiEXAopK0y3PvTx5gu98BRwywbjXZjKrS9E7gLyop13ApdPVS79aEmVlZrh3JWhR1nhprZlaWAwVZi8LjE2Zm5bl2JJse64vtzMzKc6Agu+DOLQozs/JcO5K1KBwozMzKc+1INpjtriczs/IcKHCLwsxsMK4dKY5RuEVhZlaOAwWpReH7PJmZleXakeKV2W5RmJmV40ABdHb3uEVhZjYA1474ymwzs8GM+toxIjw91sxsEKM+UHT1BL3hZ1GYmQ1k1NeOfU+382C2mVlZDhTpedn1Hsw2Mytr1NeOnV1uUZiZDWbUB4pii8LTY83Myhv1tWOhKwUKtyjMzMoa9YGisziY7RaFmVlZFdWOkk6V9JikVZIuLbP+EkkrJC2XdIekGSXrJ0p6RtKX0vIESQ/mXusk/Utad76kjty6C4fjRAeyrUXhQGFmVk7NUBkkVQPXAKcA7cASSQsjYkUu2zKgLSK2SHo/cBVwdm79J4FfFxciYiNwVO4Y9wM/zuX/QURcvBPns8M8PdbMbHCV/Iw+BlgVEasjYitwA3BmPkNELI6ILWnxXqCluE7S0cD+wG3ldi5pDjAVuHvHi7
PD3WzGxwldSO04A1ueX2lDaQC4BbACRVAZ8DPjJI/nPJWhCRS3t76sa6SVJrBWXcaZ4ea2Y2uEoChcqkRZk0JJ0HtAFXp6QPAIsiYk25/Mk5wPdzyz8HZkbEPOBXwLcGONZFkpZKWtrR0THEKQysb3qsxyjMzMoacoyCrAWR/1XfAqwtzSTpZOAy4E02DTZpAAAONklEQVQRUUjJbwBOkPQBoAGolbQpIi5N2xwJ1ETE/cX9RMQLud1+FbiyXKEi4jrgOoC2traygasS27qe3KIwMyunkkCxBJgjaRbwDFkL4C/zGSTNB64FTo2I54vpEfGuXJ7zyQa887OmzqV/awJJB0bEs2lxAbCy4rPZCYUuT481MxvMkIEiIrolXQzcClQD10fEo5KuAJZGxEKyrqYG4EZJAE9HxIIKjv9O4PSStL+RtADoBl4Ezq/0ZHaGu57MzAZXSYuCiFgELCpJuzz3/uQK9vFN4JslabPL5PsY8LFKyjUcCl09SFBb7UBhZlbOqK8dO7uzp9ullpCZmZUY9YGi0NXjqbFmZoNwoOju9cV2ZmaDGPU1ZKdbFGZmgxr1gaKQxijMzKy8UV9DZl1PblGYmQ1k1AeKrOtp1H8MZmYDGvU1ZKG711dlm5kNYtTXkIVuD2abmQ3GgaLL02PNzAYz6mvITrcozMwGNeoDRaHL02PNzAYz6mtIT481MxvcqA8Unh5rZja4UV1DRoSvzDYzG8KoriG39qSHFrnrycxsQKM6UHR2+el2ZmZDGdU1ZKG7+LxstyjMzAYyugNFalHUu0VhZjagUV1DukVhZja0igKFpFMlPSZplaRLy6y/RNIKScsl3SFpRsn6iZKekfSlXNpdaZ8PptfUlF4n6QfpWPdJmrlrpzgwj1GYmQ1tyBpSUjVwDXAaMBc4V9LckmzLgLaImAfcBFxVsv6TwK/L7P5dEXFUej2f0i4AXoqIQ4DPA1dWfDY7qNCdup7cojAzG1AlP6WPAVZFxOqI2ArcAJyZzxARiyNiS1q8F2gprpN0NLA/cFuFZToT+FZ6fxPwZkmqcNsdUuhKXU9uUZiZDaiSGnIasCa33J7SBnIBcAuApCrgc8BHBsj7jdTt9IlcMOg7XkR0AxuA/Soo5w4rtigcKMzMBlZJDVnu13yUzSidB7QBV6ekDwCLImJNmezviogjgBPS66925HiSLpK0VNLSjo6OIU6hvL7BbN891sxsQJUEinagNbfcAqwtzSTpZOAyYEFEFFLyG4CLJT0FfBb4a0mfAYiIZ9K/G4HvkXVx9TuepBpgEvBi6fEi4rqIaIuItubm5gpOY3vNE+o4/YgDmDx+zE5tb2Y2GtRUkGcJMEfSLOAZ4BzgL/MZJM0HrgVOzQ1KExHvyuU5n2zA+9IUABojYp2kMcAZwK9S1oXAu4F7gHcAd0ZE2RbMrjp6xhSOnjFlJHZtZ
PGDJQRES3pIuBW4Fq4PqIeFTSFcDSiFhI1tXUANyYhhqejogFg+y2Drg1BYlqsiDx1bTu68B3JK0ia0mcs3OnZmZmw0Ej9GN9t2pra4ulS5fu6WKYmb2qSLo/ItqGyufpPmZmNigHCjMzG5QDhZmZDcqBwszMBuVAYWZmg3KgMDOzQe0T02MldQB/HCJbE7BuNxRnb+PzHn1G67n7vHfcjIgY8tYW+0SgqISkpZXMF97X+LxHn9F67j7vkeOuJzMzG5QDhZmZDWo0BY
9nQB9hCf9+gzWs/d5z1CRs0YhZmZ7ZzR1KIwM7OdMCoChaRTJT0maZWkS/d0eUaKpFZJiyWtlPSopL9N6VMk3S7pD+nfyXu6rCNBUrWkZZJuTsuzJN2XzvsHkmr3dBmHm6RGSTdJ+n363t8wGr5vSR9Kf+OPSPq+pPp98fuWdL2k5yU9kksr+/0q88VUzy2X9PrhKsc+HygkVQPXAKcBc4FzJc3ds6UaMd3A/46Iw4FjgQ+mc70UuCMi5gB3pOV90d8CK3PLVwKfT+f9Etnz3Pc1XwB+GRGHAUeSnf8+/X1Lmgb8DdmD0F5H9kybc9g3v+9vAqeWpA30/Z4GzEmvi4CvDFch9vlAQfaI1VURsToitgI3AGfu4TKNiIh4NiIeSO83klUa08jO91sp27eAs/ZMCUeOpBbgbcDX0rKAk4CbUpZ97rwlTQT+lOxhX0TE1ohYzyj4vskeujY2PS1zHPAs++D3HRG/YftHQQ/0/Z4JfDsy9wKNkg4cjnKMhkAxDViTW25Pafs0STOB+cB9wP4R8SxkwQSYuudKNmL+Bfg/QG9a3g9YHxHdaXlf/N5nAx3AN1KX29ckjWcf/74j4hngs8DTZAFiA3A/+/73XTTQ9ztidd1oCBQqk7ZPT/WS1AD8CPhfEfHyni7PSJN0BvB8RNyfTy6TdV/73muA1wNfiYj5wGb2sW6mclKf/JnALOAgYDxZt0upfe37HsqI/c2PhkDRDrTmlluAtXuoLCMuPYf8R8B3I+LHKfm5YhM0/fv8nirfCDkeWCDpKbKuxZPIWhiNqWsC9s3vvR1oj4j70vJNZIFjX/++TwaejIiOiOgCfgwcx77/fRcN9P2OWF03GgLFEmBOmhFRSzbotXAPl2lEpH75rwMrI+Kfc6sWAu9O798N/Gx3l20kRcTHIqIlImaSfb93RsS7gMXAO1K2ffG8/xNYI+k1KenNwAr28e+
MvpWEnj0t988bz36e87Z6DvdyHw12n207HAhmIX1a4aFRfcSTqd7BdmNXB9RHx6DxdpREh6I3A38DDb+u
L9k4xQ+B6WT/yf4iIkoHyPYJkk4EPhwRZ0iaTdbCmAIsA86LiMKeLN9wk3QU2QB+LbAaeA/ZD8B9+vuW9A/A2WQz/ZYBF5L1x+9T37ek7wMnkt0h9jng74CfUub7TUHzS2SzpLYA74mIpcNSjtEQKMzMbOeNhq4nMzPbBQ4UZmY2KAcKMzMblAOFmZkNyoHCzMwG5UBhSApJ38kt10jqKN6FdZDtjkpTjwda3ybpi7tYtuZ0R9Blkk7YlX2l/c0s3okzXz5JdZJ+JelBSWdLOiHdnfRBSWN39biDlOdESceN1P4HOObX9tSNMdP5lv27krRIUuPuLpMNrWboLDYKbAZeJ2lsRLwCnAI8U8F2RwFtwKLSFZJq0hzuXZ3H/Wbg9xHx7iFzbjt2dUT0DJWvpHzzgTERcVTax78Cn42Ib1R4TJFNN+8dMnN/JwKbgN/t4HY7LSIu3F3H2hERMeCPDtuz3KKwolvI7r4KcC7w/eIKSePTffGXpF/2Z6ar3K8Azs79Cv97SddJug34dv7Xo6QGSd+Q9HC6V/7blT0/4pvKninwsKQP5QuULia7Cji9+Mte0rkp7yOSrszl3STpCkn3AW8o2c/Rkh6SdA/wwVz6iZJuljQV+DfgqHSc9wHvBC6X9N2U9yPp/Jeni72KrZOVkr4MPAC0SnqLpHskPSDpxnTfLSQ9JekfUvrDkg5TduPG/wF8KB23X4spfZ7fknRb2v6/Sboqbf9LZbdrQdLlqWyPpM9fqVW4RNkFiEj6J0mfTu/vktSW+9yulHR/alEdk9avlrQg5Tlf0pdy5bo5t98hty9joqSfSFoh6V8lVeU+o6bc5/pVZa2625RadZL+Jm23XNINA+zfhltE+DXKX2S/aOeR3SuoHniQ7JfuzWn9P5Jd5QrQCDxOdiO284Ev5fbz92R38RyblvP7uBL4l1zeycDRwO25tMYyZes7BtkN4J4Gmslaw3cCZ6V1AbxzgPNbDrwpvb8aeKRM+frep+VvAu9I799C9lxikf24upns9t4zya6APzblawJ+A4xPyx8FLk/vnwL+Z3
AeBruc/swwOU+++B3wJjyJ41sQU4La37Se7cp+S2+Q7wX9P715Ldav4UsiuVa1P6XWTPcih+bvl93pY73oOl30Favhk4sdLtS87pRKCT7M631cDtuc/5qfQZziS74vqolP5Dtv39rQXqBvp78WtkXm5RGAARsZzsP+i5bN+V9BbgUkkPklUy9WS3DyhnYWTdV6VOJnuAVPF4L5HdcmK2pP8n6VRgqDvd/glwV2Q3g+sGvktWYQP0kN0MsR9Jk8gqlF+npO+U5qnAW9JrGVnL4TCyh8MA/DGye/9D9rCoucC/p8/q3cCM3H6KN2m8n+yzrsQtkd347mGyivWXKf3h3D7+TNk4zsNkN0R8LUBEPEp2vj8H3hvZ81hKbS3Z569zx6ukjDuz/X9E9nyYHrKW6xvL5HkyIh5M7/Of13Lgu5LOIwsmtht4jMLyFpLd5/9Esuc5FAl4e0Q8ls8s6b+U2cfmAfYtSm55HBEvSToSeCtZl9A7gfcOUr5yt1Eu6ozy4xLbHXcnCPiniLi2X2LWdbS5JN/tEXHuAPsp3neoh8
7xUAIqJXUlekn9JkLZkaSfXAl8laCGsk/T1ZIC86AlgP7D/A/kv3mT9esYzd9O+mrt/B7UuVfh/lvp/8PZp6gOKEgreR/ThYAHxC0mtj2zMobIS4RWF51wNXRMTDJem3Av9TkgAkzU/pG4EJFe77NuDi4oKkyZKagKqI+BHwCbJbZA/mPuBNqR+7mqz18+vBNojsiW8blN0wEeBdFZY371bgvbnxhmlpXKPUvcDxkg5J+cZJOnSIfe/IZ1hOsdJel8pXvHsqkv4bWcD/U+CL2vkZRU+Rjd9USWole2rkrjhG2d2cq8hu7PfbSjZK+VsjYjHZQ6oagYZdLItVwIHC+kREe0R8ocyqT5L1Oy9XNrX0kyl9MTA3DcSePcTuPwVMTgOuDwF/Rna3z7tSN803gY8NUb5nU57FwEPAAxFRya2k3wNco2wwu1y32KAi4jbge8A9qXvnJspU7hHRQdaf/31Jy8kCx2FD7P7nwJ+XG8yusGzrga+SdfX8lOy2+qQg/Bnggoh4nOyuouW+20r8O/BkOsZny
fdsU9qWyPpP3+pMLtqoF/S9/BMrLnY6/fxbJYBXz3WDMzG5RbFGZmNigHCjMzG5QDhZmZDcqBwszMBuVAYWZmg3KgMDOzQTlQmJnZoBwozMxsUP8ft2GLOf4daNEAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(traindata, {},'leastAbsoluteE
or',10,3, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemi
or_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here