Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

For the following assignments, please provide as much evidence of the results as possible, including the code, screenshots (only plots – not text or code) and documentation. Submit only one pdf file...

1 answer below »
For the following assignments, please provide as much evidence of the results as possible, including the code, screenshots (only plots – not text or code) and documentation. Submit only one pdf file and .ipynb / .py files containing the code with documentation.
1.a. [10 points]
Please write a report summary (one page) as Machine Learning experts working in the industry and about the machine learning topics to understand how they are used in the industry. Submit your report in writing. List 5 key learnings / takeaways. Written submissions must be entirely your own.
1.b. [10 points]
Assume there’s a data set that has just three columns (two features and one label) and four rows (items). The four vectors co
esponding to the items are at the corners of a square. The two vectors at the ends of one diagonal of this square belong to one class and the other two vectors on the other diagonal belong to the second class. Is this data separable by a straight line? Which algorithm that you studied in the class would you choose if you were to come up with a classifier for this toy data set and why?
2.(a) [15 points]
Follow the tutorial on Naïve Bayes classifier at https:
machinelearningmastery.com/naive-bayes-classifier-scratch-python/
Write your own code for Naïve Bayes Classification of the UCLA admissions dataset
Download from https:
stats.idre.ucla.edu/stat/data
inary.csv
Comment on the performance of Naïve Bayes
2.(b) [10 points]
Describe five real-world applications in which regression can be used. For each of these applications, describe the y-value and the co
esponding feature vector X. Also discuss whether linear regression can be used in each case.
3.(a) [10 points]
Refer to online tutorials on K-NN implementation such as https:
machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/
Extend the implementation to use various distance metrics such as Manhattan distance and note if the classification changes with the distance metric (for an more exhaustive list of distances, see getDistMethods() in R). Choose one of the cleaned datasets at https:
www.kaggle.com/annavictoria/ml-friendly-public-datasets
3.(b) [5 points]
In K-NN, we ignored the direction component of the vectorized representation of data items and only considered the distance. Does it make sense to also consider the direction of the nearest neighbor in addition to or instead of the distance from it? Why or why not?
3.(c) [10 points]
Listing which problem domains are best suited for each,
iefly explain in your own words, the pros and cons of
· Logistic Regression
· K-NN
· SVM
· Naïve Bayes
· Decision Trees
4. [25 Points]
Manually generate the decision tree (as much as possible) for the following subset from a large dataset using the ID3 algorithm. Show the information gain computation at each stage. Then generate the decision tree programmatically using Python. Submit code and the decision tree so generated.
Answered 3 days After Apr 09, 2021

Solution

Sandeep Kumar answered on Apr 13 2021
157 Votes
id3.ipyn
{
"metadata": {
"language_info": {
"codemi
or_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4-final"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python394jvsc74a57bd081118431cc388d258ed977b65143603a98f8ad6ed776c173758a3af876bc6de9",
"display_name": "Python 3.9.4 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from matplotlib import pyplot as plt\n",
"from sklearn import datasets\n",
"from sklearn.tree import DecisionTreeClassifier \n",
"from sklearn import tree\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Windy?\\tAir Quality Good?\\tHot?\\tPlay Tennis?\n",
"0 No\\tNo\\tNo\\tNo\n",
"1 Yes\\tNo\\tYes\\tYes\n",
"2 Yes\\tYes\\tNo\\tYes\n",
"3 Yes\\tYes\\tYes\\tNo"
],
"text/html": "
\n