Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 4 for this unit. Key...

1 answer below »
SIT720 Machine Learning
Assessment Task 4: Problem solving task.

©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720
This document supplies detailed information on Assessment Task 4 for this unit.
Key information
• Due: Monday 27 September 2021 by 8.00 pm (AEST),
• Weighting: 25%
Learning Outcomes
This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning
Outcomes (GLO):
Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO)
ULO3 - Perform linear regression, classification
using logistic regression and linear Support Vector
Machines.
ULO4 - Perform non-linear classification using KNN
and SVM with different kernels.
ULO5 - Perform non-linear classification using
Decision trees and Random forests.
ULO6 - Perform model selection and compute
elevant evaluation measure for a given problem.
ULO7 - Use concepts of machine learning algorithms
to design solution and compare multiple solutions.
GLO1 - through the assessment of student ability to
apply advanced data processing techniques through
programming for prediction.
GLO5 - through assessment of student ability to deal
with defined data set and solve problems.
Purpose
Students will be given a specific data set for analysis and will be required to develop and compare various
classification techniques. Each student must demonstrate skills acquired in data representation, classification,
and evaluation.

Assessment 4 XXXXXXXXXXTotal marks = 30

Submission Instructions
a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and
explanations including outputs and figures into a separate file and submit as a PDF file.
) Submission other than the above-mentioned file formats will not be assessed and given zero for the
entire submission.
c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the question
i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for better
presentation of the code, add question only before the first solution cell.
d) Your submitted code should be executable. If your code does not generate the submitted solution,
then you will get zero for that part of the marks.
e) Answers must be relevant and precise.
f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.
g) Use topics covered till week 10 for answering this assignment.
h) Submit your assignment after running each cell individually.
i) The submitted notebook file name should be of this form “SIT720_A4_studentID.ipynb”. For example, if your
student ID is 1234, then the submitted file name should be “SIT720_A4_1234.ipynb”.








SIT720 Machine Learning
Assessment Task 4: Problem solving task.

©Deakin University XXXXXXXXXX2 XXXXXXXXXXSIT720
_____________________________________________________________________________________
Questions
_____________________________________________________________________________________

1. What is an ensemble classifier? Name some of the popular ensemble methods (at least three) and which
one you prefer and why? XXXXXXXXXX2 marks)
2. Let’s assume we have a noisy dataset. You want to build a classifier model. Which classifier is appropriate
for your dataset and why? XXXXXXXXXX marks)
_____________________________________________________________________________________

Background
In the modern world, customer details are very important to suggest any product for buying. Gender, age and
education have impact on level of consumption of different products. So, it is essential for businesses to
analyse their customer details to better understand consumer behaviour and their impact on various products.
Dataset filename: Customer relationship marketing (CRM).csv
Dataset description: This dataset includes data on customer details and their response to buy any products.
The data contains 20 attributes and 9134 records.
Features and labels: The attribute names are listed below.
I. State
II. Customer Lifetime Value
III. Response
IV. Coverage
V. Education
VI. Effective To Date
VII. EmploymentStatus
VIII. Gender
IX. Income
X. Location Code
XI. Marital Status
XII. Monthly Premium Auto
XIII. Months Since Last Claim
XIV. Number of Open Complaints
XV. Number of Policies * Policy
XVI. Renew Offer Type
XVII. Sales Channel
XVIII. Total Claim Amount
XIX. Vehicle Class

_____________________________________________________________________________________
Questions
_____________________________________________________________________________________
3. Load and pre-process the dataset if necessary. Explain steps that you have taken. Are there any
alternative ways for doing that? Explain XXXXXXXXXX marks)

SIT720 Machine Learning
Assessment Task 4: Problem solving task.

©Deakin University XXXXXXXXXX3 XXXXXXXXXXSIT720
4. Analyse the importance of the features for predicting customer response using two different approaches.
Explain the similarity/difference between outcomes XXXXXXXXXXmarks)
5. Create three supervised machine learning (ML) models except any ensemble approach for predicting
customer response. XXXXXXXXXX10 Marks)
a. Report performance score using a suitable metric. Is it possible that the presented result is an
overfitted one? Justify.
. Justify different design decisions for each ML model used to answer this question.
c. Have you optimised any hyper-parameters for each ML model? What are they? Why have you
done that? Explain.
d. Finally, make a recommendation based on the reported results and justify it.
6. Build three ensemble models for predicting customer response XXXXXXXXXXMarks)
a. When do you want to use ensemble models over other ML models?
. What are the similarities or differences between these models?
c. Is there any preferable scenario for using any specific model among set of ensemble models?
d. Write a report comparing performances of models built in question 5 and 6. Report the best
method based on model complexity and performance.
e. Is it possible to build ensemble model using ML classifiers other than decision tree? If yes, then
explain with an example.
N. B. This is a HD (High Distinction) level question. Those students who target HD grade
should answer this question (including answering all the above questions). For others, this
question is an option. This question aims to demonstrate your expertise in the subject area
and the ability to do your own research in the related area.
Submission details
Deakin University has a strict standard on plagiarism as a part of Academic Integrity. To avoid any issues with
plagiarism, students are strongly encouraged to run the similarity check with the Turnitin system, which is
available through Unistart. A Similarity score MUST NOT exceed 39% in any case. Late submission penalty is
5% per each 24 hours from- Monday 27 September 2021 by 8.00 pm (AEST), No marking on any submission
after 5 days (24 hours X 5 days from- Monday 27 September 2021 by 8.00 pm (AEST),).
Extension requests
Requests for extensions should be made to Unit/Campus Chairs well in advance of the assessment due date.
If you wish to seek an extension for an assignment, you will need to submit a request using the “Extension
Request” link of the “Assessment” menu in the unit site, as soon as you become aware that you will have
difficulty in meeting the scheduled deadline, but at least 3 days before the due date. When you make your
equest
Answered 15 days After Sep 17, 2021

Solution

Pritam Kumar answered on Sep 18 2021
163 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data Loading"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"