Lecture Recording...

Question

Lecture Recording URL: https:ucidce.zoom.usec/share/-tJOIKHfykJOSIXf6h7DQYQ7Rp73eaa81HUb8_MOxI7uyzsycMubf064CsL-1TMDay 6: Assignment, Ensembles, Decision Trees, and Trading SystemsSubmit AssignmentInstructionsThis assignment looks at using k-nearest neighbors to create a simple recommendation engine. Homework steps:· Open the homework notebook link: LINK TO NOTEBOOK (Links to an external site.)· Save a copy to your Google drive· Answer the questions in the notebook copy with your code and answers to the question· Set sharing to "Anyone with a link can view".· Save the notebook and submit the link Day 6: ContentOverviewIn information-based modeling, we use again utilize the structure of past data in order to build models for regression and classification problems. In this module, we cover decision trees which is a modeling method based on information gain. The resulting model is a tree structure based on actual values of attributes in the data and is often a big favorite in machine learning because of its readability. Ensemble methods are also touched upon in this module as a way to augment the modeling power from multiple models.Readings and Media· Class slides: Information-based Modeling·  Modeling Methods, Deploying, and Refining Predictive ModelsModeling Methods, Deploying, and Refining Predictive ModelsUCI Spring 2020Class 6 Information-based ModelingSchedule2Introduction and OverviewData and Modeling + Simulation ModelingEor-based ModelingProbability-based ModelingSimilarity-based ModelingInformation-based ModelingTime-series ModelingDeploymentAt the end of this module:You will learn how to build:Decision Trees andEnsemblesFoRegression and classification 3Supervised MethodsEor-basedSIMILARITY-basedInformation-basedProbability-basedNeural networks and deep Learning-based methodsEnsemblesToday’s ObjectivesInformation-based ModelingDecision TreesEnsemblesInformation-based AlgorithmsModels which are based on information gain in data sets such as decision trees.Decision tree methods construct a model of decisions made based on actual values of attributes in the data.Decisions fork in tree structures until a prediction decision is made for a given record. Decision trees are trained on data for classification and regression problems. Decision trees are often fast and accurate and a big favorite in machine learning.The most popular decision tree algorithms are:Classification and Regression Tree (CART)Iterative Dichotomiser 3 (ID3)C4.5 and C5.0 (different versions of a powerful approach)Chi-squared Automatic Interaction Detection (CHAID)Decision StumpM5Conditional Decision TreesToday’s ObjectivesInformation-based ModelingDecision TreesEnsemblesDecision TreesRobust and intuitive predictive models when the target attribute is categorical in nature and when the data set is of mixed data typesUnlike more numerical methods, decision trees are better at handling attributes that have missing or inconsistent values Decision trees tell the user what is predicted, how confident that prediction can be, and how we aived at that predictionPopular method when communicability is a priorityComputationally efficientApplicationsMedicine: used for diagnosis in numerous specialtiesFinancial analysis: credit risk modelingInternet routing: used in routing tables to find next router to handle packet based on the prefix sequence of bitsComputer vision: tree-based classification for recognizing 3D objectsMany more…An example of a Decision Tree developed in RapidMineDecision trees are made of nodes and leaves to represent the best predictor attributes in a data setElements of a decision treeContains imagesSuspicious wordsUnknown sendespamlegitspamlegittruefalsetruefalseRoot NodeNodesNodesLeaf NodesDepthDecision PathtruefalseThe ABT for decision trees    Descriptive Feature 1    Descriptive Feature 2    …    Descriptive Feature m    Target Feature    Obs 1    Obs 1        Obs 1    Target value 1    Obs 2            Obs 2    Target value 2    .            .    .    .    Obs 2        .    .    Obs n-2            Obs n-2    .    Obs n-1            Obs n-1    .    Obs n    Obs n        Obs n    Target value nCategorical, numeric, or mixed feature space.This just represents sets. The heterogeneity of sets represents entropy. Can be numeric or categoricalShannon’s entropy model and cardsEntropy(card) = 0.0Entropy(card) = .81Entropy(card) = 1.0Entropy(card) = 1.50Entropy(card) = 1.58Entropy(card) = 3.58Entropy increases as uncertainty increasesEntropy(card) = 0.0Entropy(card) = .81Entropy(card) = 1.0Entropy(card) = 1.50Entropy(card) = 1.58Entropy(card) = 3.58Shannon’s Model of EntropyCornerstone of modern information theoryMeasures heterogeneity of a setDefined as:P(d=l): probability of randomly selecting an element d of type lL is number of different types of d in the sets is an aitrary base, but for information modeling, 2 is used to represent bitsShannon’s Model of EntropyCornerstone of modern information theoryMeasures heterogeneity of a setDefined as:P(d=l): probability of randomly selecting an element d of type lL is number of different types of d in the sets is an aitrary base The ABT for decision trees    Descriptive Feature 1    Descriptive Feature 2    …    Descriptive Feature m    Target Feature    Obs 1    Obs 1        Obs 1    Target value 1    Obs 2            Obs 2    Target value 2    .            .    .    .    Obs 2        .    .    Obs n-2            Obs n-2    .    Obs n-1            Obs n-1    .    Obs n    Obs n        Obs n    Target value nOur dataset . We can repartition  by a descriptive feature, d, for each of the levels that d can take; e.g. . Each partition reduces the entropy in the set. The difference is the information gain.Levels(Y) is the set of levels in the domain of target feature Y and  is a value in Levels(Y) with L levelsEntropy for our datasetEntropy for our datasetRemaining entropy in a partitioned datasetEntropy remaining when we partition the dataset:Information gainEntropy remaining when we partition the dataset:Information gained by splitting the dataset using the feature d:Decision tree processCompute the entropy of the original dataset with respect to the target feature. This gives us a measure of how much information is required in order to organize datasets into pure sets which relates to the heterogeneity or entropy of the set.Decision tree processCompute the entropy of the original dataset with respect to the target feature. For each descriptive feature, create the sets that result by partitioning the instances in the dataset using their feature values and then sum the entropy scores of each of these sets. This is the remaining entropy in the partitioned sets and is required to organize the instances into pure sets after we have split them using the descriptive featureDecision tree processCompute the entropy of the original dataset with respect to the target feature. For each descriptive feature, create the sets that result by partitioning the instances in the dataset using their feature values and then sum the entropy scores of each of these sets. Subtract the remaining entropy value from the original entropy value to compute the information gain.ImplementationIterative Dichotomizer 3 (ID3) algorithm is one of the most popular approaches. Top-down, recursive, depth-first partitioning beginning at the root node and finishing at the leaf nodes.Assumes categorical features and clean data but can be extended to handle numeric features and targets and noisy data via thresholding and pruning.Today’s ObjectivesInformation-based ModelingDecision TreesEnsemblesEnsemblesInstead of focusing on a single model for prediction, what if we generate a set of independent models, aggregate them, and compose their outputs? Ensemble propertiesBuild multiple independent models from the same dataset but each model uses a modified subset of the dataset Make a prediction by aggregating the predictions of the different models in the ensemble. For categorical targets, this can be done using voting mechanisms.For numeric targets, this can be done using a measure of central tendency like the mean or median.Ensemble propertiesBuild multiple independent models from the same dataset but each model uses a modified subset of the dataset Make a prediction by aggregating the predictions of the different models in the ensemble. For categorical targets, this can be done using voting mechanisms.For numeric targets, this can be done using a measure of central tendency like the mean or median.BoostingIncreasing repetitions to target weak performance.Boosting ideaStep 1:Use a weighted dataset where each instance has an associated weight. Initially, distribute the weights uniformly to all instances.Sample over this weighted set to create a replicated training set and create a model using the replicated training set.Find the total eor in the set of predictions made by the model.(Prediction, Eor Rate)Boosting ideaStep 2Increase the weight for the misclassified instances and decrease the weight for coectly classified instances. The number of times an instance is replicated is proportional to its weightCalculate a confidence measure of the model based on the eor. This is used to weight the predictions from the models(Prediction, Eor Rate)(Prediction, Eor Rate)Model 1Model 2Confidence measuresReplicated instancesBoosting ideaStep 3Make a prediction using the weighted models by:For categorical targets, this can be done using voting mechanisms.For numeric targets, this can be done using a measure of central tendency like the mean or median.Model 1Model 3Model 2Bagging (bootstrap aggregating)Bagging and subspace samplingBagging is another method to generate ensembles. Bagging (bootstrap aggregating)Bagging and subspace samplingRandom samples the same size of the dataset are sampled with replacement from the dataset.These are the bootstrap samples. Bootstrap samplesBagging (bootstrap aggregating)Bagging and subspace samplingFor each of the bootstrap samples, we create a model.Because we trained the models on sampled datasets with replacement, there will be duplicates and missing instances in each training set.This creates many different models because of the different data setsThis is called subspace sampling Decision treesBagging (bootstrap aggregating)Bagging and subspace samplingRandom Forest The ensemble of decision trees resulting from subspace sampling is refeed to as a Random Forest The ensemble makes predictions by returning the majority vote or by the median for continuous features.Boosting vs. BaggingWhich method is prefeed is up to experimentationTypically, boosting exhibits a tendency towards overfitting with a large number of features.Review of topicsInformation-based ModelingDecision TreesEntropyInformation gainCategorical and numeric prediction EnsemblesBoostingBaggingComparison of ABT/Feature matrix conceptsEor-basedProbability-basedSimilarity-basedInformation-basedWe need to have an Analytics Base Table (ABT) before we can model anything    Descriptive Feature 1    Descriptive Feature 2    …    Descriptive Feature m    Target Feature                                                                                                                                            The ABT and the Model    Descriptive Feature 1    Descriptive Feature 2    …    Descriptive Feature m    Target Feature    Obs 1    Obs 1        Obs 1 Categorical    Target value 1    Obs 2            Obs 2 Categorical    Target value 2    .            .    .    .    Obs 2        .    .    Obs n-2            Obs n-2 Categorical    .    Obs n-1            Obs n-1 Categorical    .    Obs n    Obs n        Obs n Categorical    Target value nExistence of a target feature automatically make the modeling problem supervised.The data type of the feature restrict which models can be usedThe dataset characteristics may restrict the resolution of the model, force you to make assumptions, or require modeling for imputation, de-noising, data generation, etc.Understanding and manipulating feature spaces is the key to data analyticsN-dimensional vector space representation of language produces an incredible ability to perform word-vector arithmetic.Image source: Deep Learning Illustrated by KrohnThe ABT/ Feature spaceThe ABT/feature space representation is nothing more than an n-dimensional matrixModeling methods are just different ways to perform statistical, mathematical, or even heuristic

Ishvina · Accepted Answer

Link to Solutions of Module 6 Homework - Decision trees and trading systems:
https://colab.research.google.com/drive/1lmyLYxbmIvmmVVaMltnTr4XWxA0SUr7V?usp=sharing
The solutions are as per the resources provided.

Lecture Recording URL: https://ucidce.zoom.us/rec/share/-tJOIKHfykJOSIXf6h7DQYQ7Rp73eaa81HUb8_MOxI7uyzsycMubf064CsL-1TM Day 6: Assignment, Ensembles, Decision Trees, and Trading Systems Submit...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment