Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 2 for this unit. Key...

1 answer below »
SIT720 Machine Learning
Assessment Task 2: Problem solving task.

©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720
This document supplies detailed information on Assessment Task 2 for this unit.
Key information
• Due: Week 7, Monday 30 August 2021 by 8.00 pm (AEST),
• Weighting: 15%
Learning Outcomes
This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning
Outcomes (GLO):
Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO)
ULO2 - Perform unsupervised learning of data such
as clustering and dimensionality reduction.
GLO1 - through the assessment of student ability to
use data acquisition techniques to obtain, manipulate
and represent data.
GLO3 - through student ability to use specific
programming language and modules to obtain, pre-
process, transform and analyse data.
GLO4 -through assessment of student ability to
make decisions to obtain data, use appropriate
techniques to represent and visualise complex
elationships in the data.
GLO5 - through assessment of student ability to
solve problems relates to ill-defined data.
Purpose
This assessment task is for student to apply skills for data clustering and dimensionality reduction. Students
will be required to demonstrate ability in data representation, and competency in applying suitable
clustering/dimensionality reduction techniques in a real-world scenario.

Assessment 2 XXXXXXXXXXTotal marks = 40

Submission Instructions
a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and
explanations including outputs and figures into a separate file and submit as a PDF file.
) Submission other than the above-mentioned file formats will not be assessed and given zero for the
entire submission.
c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the question
i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for better
presentation of the code, add question only before the first solution cell.
d) Your submitted code should be executable. If your code does not generate the submitted solution,
then you will get zero for that part of the marks.
e) Answers must be relevant and precise.
f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.
g) Use topics covered till week 6 for answering this assignment.
h) Submit your assignment after running each cell individually.
i) The submitted notebook file name should be of this form “SIT720_A2_studentID.ipynb”. For example, if your
student ID is 1234, then the submitted file name should be “SIT720_A2_1234.ipynb”.







SIT720 Machine Learning
Assessment Task 2: Problem solving task.

©Deakin University XXXXXXXXXX2 XXXXXXXXXXSIT720
_____________________________________________________________________________________
Questions
_____________________________________________________________________________________
Datafile: Download the dataset (.csv) from the SCADI .
Data Description: This dataset contains 206 attributes of 70 children with physical and motor disability based
on ICF-CY. For more information click this link.
1. Determine the number of subgroups from the dataset using attributes 3 to 205 i.e., exclude attributes 1,
2 and 206. Is this number same as number of classes presented by attribute 206? Explain and justify
your findings. XXXXXXXXXX4 marks
2. Is this data facing curse of dimensionality? If so, then how to solve this problem. Explain with a two-
dimensional plot and report relevant loss of information. XXXXXXXXXX4 marks
3. After applying principal component analysis (PCA) on a given dataset, it was found that the percentage
of variance for the first N components is X%. How is this percentage of variance computed? 2 marks
___________________________________________________________________________________

Background
Obesity has become a global epidemic that has doubled since 1980, with serious consequences for health in
children, teenagers, and adults. Obesity levels in individuals may relate to their eating habits and physical
condition. In this assessment, you will be analysing and creating ML models based on a given dataset that
contains attributes of individuals with relation to obesity levels.
Dataset filename: obesity_levels.csv
Dataset description: This dataset include data for the estimation of obesity levels in individuals based on their
eating habits and physical condition. The data contains 17 attributes and 2111 records.
Features and labels: The attribute names are listed below. The description of the attributes can be found in this
article (web-link).
I. Gender
II. Age
III. Height
IV. Weight
V. family_history_with_overweight (family history of overweight)
VI. FAVC (frequent high caloric food)
VII. FCVC (vegetables per meal)
VIII. NCP (number of main meals per day)
IX. CAEC (any food between meals)
X. SMOKE (smoking)
XI. CH2O (daily water intake)
XII. SCC (daily consumed calories)
XIII. FAF (frequency of physical activity)
XIV. TUE (technology usage)
XV. CALC (consumption of alcohol)
XVI. MTRANS (means of transport)
XVII. NObeyesdad (obesity levels, i.e. Insufficient Weight, Normal Weight, Overweight Level I, Overweight
Level II, Obesity Type I, Obesity Type II and Obesity Type III)
_____________________________________________________________________________________
Questions
https:
archive.ics.uci.edu/ml/datasets/SCADI
https:
www.mdpi.com/ XXXXXXXXXX/11/1/89/htm
https:
doi.org/10.1016/j.dib XXXXXXXXXX

SIT720 Machine Learning
Assessment Task 2: Problem solving task.

©Deakin University XXXXXXXXXX3 XXXXXXXXXXSIT720
_____________________________________________________________________________________

4. Create a machine learning (ML) model for predicting “weight” using all features except “NObeyesdad”
and report observed performance. Explain your results based on following criteria:
XXXXXXXXXX10 marks
a. What model have you selected for solving this problem and why?
. Have you made any assumption for the target variable? If so, then why?
c. What have you done with text variables? Explain.
d. Have you optimised any model parameters? What is the benefit of this action?
e. Have you applied any step for handling overfitting or underfitting issue? What is that?
5. Create a ML model for classifying subjects into two classes applying following constraints on above
dataset. XXXXXXXXXX12 marks
• Use “NObeyesdad” as target variable and rest of them as predictor variables.
• drop samples with value “Insufficient Weight” for “NObeyesdad”
• Group Normal Weight, Overweight Level I,
Answered 2 days After Aug 21, 2021

Solution

Karthi answered on Aug 23 2021
148 Votes
89946/machine_learning.ipyn
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"source": [
"import pandas as pd\n",
"import seaborn as sns\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import collections\n",
"from collections import Counter\n",
"\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"from sklearn.preprocessing import OrdinalEncoder\n",
"from sklearn.preprocessing import OneHotEncoder\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.ensemble import AdaBoostClassifier\n",
"from sklearn.linear_model import SGDClassifier\n",
"\n",
"from sklearn.metrics import accuracy_score\n",
"from sklearn.metrics import classification_report"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 3,
"source": [
"df = pd.read_csv('obesitylevels.csv')"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Gender Age Height Weight family_history_with_overweight \\\n",
"0 Female 21.000000 1.620000 64.000000 yes \n",
"1 Female 21.000000 1.520000 56.000000 yes \n",
"2 Male 23.000000 1.800000 77.000000 yes \n",
"3 Male 27.000000 1.800000 87.000000 no \n",
"4 Male 22.000000 1.780000 89.800000 no \n",
"... ... ... ... ... ... \n",
"2106 Female 20.976842 1.710730 131.408528 yes \n",
"2107 Female 21.982942 1.748584 133.742943 yes \n",
"2108 Female 22.524036 1.752206 133.689352 yes \n",
"2109 Female 24.361936 1.739450 133.346641 yes \n",
"2110 Female 23.664709 1.738836 133.472641 yes \n",
"\n",
" FAVC FCVC NCP CAEC SMOKE CH2O SCC FAF TUE \\\n",
"0 no 2.0 3.0 Sometimes no 2.000000 no 0.000000 1.000000 \n",
"1 no 3.0 3.0 Sometimes yes 3.000000 yes 3.000000 0.000000 \n",
"2 no 2.0 3.0 Sometimes no 2.000000 no 2.000000 1.000000 \n",
"3 no 3.0 3.0 Sometimes no 2.000000 no 2.000000 0.000000 \n",
"4 no 2.0 1.0 Sometimes no 2.000000 no 0.000000 0.000000 \n",
"... ... ... ... ... ... ... ... ... ... \n",
"2106 yes 3.0 3.0 Sometimes no 1.728139 no 1.676269 0.906247 \n",
"2107 yes 3.0 3.0 Sometimes no 2.005130 no 1.341390 0.599270 \n",
"2108 yes 3.0 3.0 Sometimes no 2.054193 no 1.414209 0.646288 \n",
"2109 yes 3.0 3.0 Sometimes no 2.852339 no 1.139107 0.586035 \n",
"2110 yes 3.0 3.0 Sometimes no 2.863513 no 1.026452 0.714137 \n",
"\n",
" CALC MTRANS NObeyesdad \n",
"0 no Public_Transportation Normal_Weight \n",
"1 Sometimes Public_Transportation Normal_Weight \n",
"2 Frequently Public_Transportation Normal_Weight \n",
"3 Frequently Walking Overweight_Level_I \n",
"4 Sometimes Public_Transportation Overweight_Level_II \n",
"... ... ... ... \n",
"2106 Sometimes Public_Transportation Obesity_Type_III \n",
"2107 Sometimes Public_Transportation Obesity_Type_III \n",
"2108 Sometimes Public_Transportation Obesity_Type_III \n",
"2109 Sometimes Public_Transportation Obesity_Type_III \n",
"2110 Sometimes Public_Transportation Obesity_Type_III \n",
"\n",
"[2111 rows x 17 columns]"
],
"text/html": [
"
\n",
"