Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

MIS772 Assignment A1 MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 Assignment A2 / Workshops M1-M3: RM his assignment covers all workshops in modules M1-M3. By completing...

1 answer below »
MIS772 Assignment A1
MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3
Assignment A2 / Workshops M1-M3: RM
his assignment covers all workshops in modules M1-M3. By completing the workshops
and assignment students will understand how to use RapidMiner (RM) to explore data,
gain insights into the problem domain, create and validate estimation and clustering
models, perform segmentation analysis and text mining. The workshop will rely on
students’ knowledge of methods and techniques introduced in a series of classes. The
assignment will have two deliverables in the form of learning portfolios LP3 and LP4.
During the workshop (on-campus and on-cloud) students will work in teams but submit
their individual reports based on their tasks as related to the data set. The work is
expected to use RapidMiner Studio. Demonstrations and lab exercises will assist skill
development.
Before attending RM workshops, students are required to become familiar with class
notes and all textbook readings (see the topic schedule with chapter references).
Activities – No late a
ivals for the on-campus sessions! Topic
1. Learn to use RapidMiner Studio. Preparation
2. The workshop facilitator will explain the case in the focus of this assignment.
Work in groups of up to 4 (also 1-2-3).
M1,
M2T1
Classification
Cross-Validation
Optimisation
Data Prep
Start by formulating a business problem (it may change later).
3. Revise classification models (such as k-NN and decision trees), cross-
validation, clustering and simple model optimisation.
Learn about the problem area and the assignment data.
Download your data as a CSV (or JSON if
ave) file, explore your data.
Select attribute types, nominate them as labels and predictors.
Do not modify these ‘raw’ data files outside of the RM environment.
4. Learn to parse and represent text data, reduce data dimensionality, perform
segmentation analysis, create and evaluate predictive models with attributes
derived from text, visualise results.
M2T2
Text Mining &
Sentiment
5. Use RM to clean and transform data, deal with missing values, produce
simple statistics and charts, build estimation models using multiple
egression and neural networks. Learn how to create model ensembles,
such as random forests, boosting, stacking and bootstrapping ensembles.
M2T3 & M2T4
Estimation
Neural Nets
Ensembles
6. Study the techniques associated with the deployment or analytic processes.
Extend your work on neural networks with deep learning systems.
M3T1 & M3T2
Deployment
Deep Learning
7. As a team member, prepare an individual report using the provided
template. The report should be in PDF format. Also, include all RM
processes in RMP format. If you have altered the data, attach the modified
data to your submission.
Report and
Executive
Summary
8. By the specified deadline, individually submit two components of your
learning portfolio, i.e. LP3 and later LP4 parts of the assignment via
CloudDeakin dropbox. With each submission, include your report in PDF,
formatted using the provided template plus a ZIP archive of all models, i.e.
your RapidMiner scripts (.RMP files) – do not use other file formats!
Submission
Learning Portfolio
1 of 4
Objectives
Methods
Prerequisites
Workshop
Schedule
MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3
This mini case study will be used in all workshops of module 1, i.e. M1T1-M1T4. All
amendments, extensions and assumptions should be recorded in the final submission.
Australian Wine Importers (AWI) asked you to develop a
method of estimating rating (points) of imported wines based
on their text and structured attributes.
AWI provided you with a sample of 130,000 wine tasting
esults, which include:
 Wine “title” (name + vintage);
 Country, Province and Region;
 Variety and Winery;
 Description and Designation;
 Price (US$).
However:
 Taster name and Points to be excluded.
In the future, AWI would like to get the preliminary insight as to the wine quality based
on social media reviews. The following questions are of interests to AWI:
A) What group of wines the new wine is most similar to, and why / how? and,
B) What is the estimated rating of the newly introduced wine to the Australian market?
(fractional ratings permitted)
AWI wants you to cleanup and explore wine tasting data, develop and evaluate a wine
ating estimator, and minimize the estimation e
or in the process.
In technical terms:
Your project objectives form a learning portfolio. The first objective (LP3) is to acquire
and explore the available data using clustering and segmentation analysis, visualise and
eport relationships in text and structured data, also prepare data for further processing.
The second objective (LP4) is to create an estimation system able to answer management
questions using all available data. Text mining will be strongly featured in assignment
A2. Reports in PDF format and models developed in LP3 and LP4 in ZIP archives are to
e submitted via CloudDeakin by their respective deadlines.
Data:
Data: http:
www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip
Original data source: https:
www.kaggle.com/zynicide/wine-reviews
Hints on the process:
Formulate a business problem using plain English statements, however, cross-reference
them with technical aspects described in the subsequent sections. When describing the
problem and its solution keep in mind what can be achieved by using the available data.
Note that what you have been asked for and what can be delivered are two different
things, e.g. to solve the problem you may need to na
ow or slightly change the problem
scope or the model may provide quality answers only within a specific range of data
characteristics, if so then this is what you need to report or recommend to AWI
management.
2 of 4
Mini
Case Study
http:
www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip
https:
www.kaggle.com/zynicide/wine-reviews
http:
www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip
MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3
Explore your text and non-text attributes in terms of their clustering and segmentation.
Use appropriate visualisations, analyse and interpret them. As the report template
provides very limited space, be selective about what you include in the report – each
chart and table must have a purpose and a description to advance your argument, use
them as evidence!
Depending on the model, some attributes may need to be transformed before using them
in modelling tasks. You may also have to deal with inco
ect or missing values. Look at
your modelling options, optimise their parameters and compare evaluation results.
Check the assessment criteria on the next page to see how you are going to be assessed.
Stick to the recommended process. Complete the basics first before moving to the more
advanced tasks or any extensions and research tasks.
You will submit your work in two learning portfolio parts LP3 and LP4.
Each part needs to be lodged via CloudDeakin dropbox before the deadline.
You will be allowed to submit your work once only!
It is essential that your reports use LP3 and LP4 templates.
Follow instructions embedded in the templates!
Both reports must fit into a strict page limit imposed by the template.
Only pages within the template limit will be reviewed and assessed!
Make sure that the problem statement and the executive summary are aimed at non-
technical readers, while the remaining parts of the reports aim at a data / business analyst
(and not highly technical programmers).
Your submission must include the report in PDF format and a ZIP archive of .RMP
script files (these can be found in the RM project folder – simply ZIP these files).
Submissions not in a PDF and ZIP format will not be open or assessed!
There is a strict deadline for each submission. In cases of some documented illness, a
special consideration may be granted but must be applied for well ahead of the deadline.
In general, requests for special considerations received less than three days before
deadline will not be considered!
An automatic late penalty of 5% of the available marks per day (up to 5 days) will be
applied to all late assignment submissions.
Late penalties apply immediately past the deadline – even 1 second!
Both parts LP3 and LP4 will be marked together after part LP4 is submitted.
Feedback will be provided on both parts together.
Team work and collaboration is encouraged but plagiarism will be penalised.
Team members can share ideas and help each other in solving technical problems. Seek
your team’s feedback on all aspects of your assignment, especially before its submission.
However, your assignment needs to be completed individually.
Ensure that your assignment is unique, otherwise plagiarism will be assumed!
3 of 4
Assignment
Submission
MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3
The work will be assessed based on the following criteria. Use RapidMiner for both
assignment tasks LP3 and LP4. Other tools can be used for the tasks associated with the
esearch section only. Do not start the advanced tasks before meeting the expectations
first (or no points will be given). Use submission template for both LP3 and LP4.
LP3 Exceptional
Ranges: 80–90–100%
Meets Expectations
Ranges: 50–65–79%
Unacceptable
Ranges: 0–25–49%
5  One page limit  0
P
o
le
m Identify what decisions need to
e drawn and what actions
need to be supported.
Succinctly state a business problem (or
question) and specify what insights need to be
generated from data.
Not provided or in-
comprehensible.
25  One page limit  0
Da
ta
P
e
p
Deal with e
ors and missing
values. Reduce data
dimensionality. Provide
comprehensive analysis,
tabulate your results. Answer
the management question (A).
Parse text attributes. Then, conduct clustering
and segmentation analysis of both structured
and text data. In the process, identify
elationships in data. Visualise and interpret the
obtained results. Annotate all charts (with text
and a
ows) to highlight important insights.
Not meeting
expectations. Missing
RM process files.
Over the page limit.
Include: Report (use template, in PDF) and RMP files (in ZIP), with explanation how to reproduce all results.
LP4 Exceptional
Ranges: 80–90–100%
Meets Expectations
Ranges: 50–65–79%
Unacceptable
Ranges: 0–25–49%
5  One page limit  0
Ex
ec
R
ep
o
t Na
ow down the business
problem. Identify decisions and
actions that will be supported
y the analytic solution. Include
a list of used academic refs.
Restate /
Answered Same Day Apr 15, 2021 MIS772 Deakin University

Solution

Abr Writing answered on Apr 20 2021
143 Votes
Load data.
































Change role to 'regular' for all columns.















Define the target column for the predictive model.








Should define a target column?































Discretize by binning (same range per bin).





























Discretize by frequency (same count per bin).








Should discretize numerical target column?



























Map some nominal target values to new values.








Should map nominal values?




























Make sure that target is binary for positive class mapping.
















Potentially define which one should be the positive class.









Should define positive class?

























Potentially remove columns.








Should remove columns?

















No date processing is desired here, so simply remove the date columns completely.






















Check if there actually are any date columns in the data.













Adds an additional column with the date today. This can be useful for calculations of ages etc.
















































Select the other way around here and store in the macro if that column already exists.







Store if the other way round exists.












Generate the difference for the two date columns in milliseconds.
















Both date columns are the same or the other way round already was created - do nothing here!

Only calculate the differences between the two date columns if the columns are not equal and if the other way around has not been calculated yet.










Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).








Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).














Remove the generated today column again.















































































































































...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here