MIS772 Assignment A1MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops...

Question

MIS772 Assignment A1MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3Assignment A2 / Workshops M1-M3: RMhis assignment covers all workshops in modules M1-M3. By completing the workshopsand assignment students will understand how to use RapidMiner (RM) to explore data,gain  insights  into  the  problem domain,  create  and  validate  estimation  and  clusteringmodels,  perform segmentation  analysis  and  text  mining.  The  workshop  will  rely  onstudents’ knowledge of methods and techniques introduced in a series of classes. Theassignment will have two deliverables in the form of learning portfolios LP3 and LP4.During the workshop (on-campus and on-cloud) students will work in teams but submittheir  individual  reports  based  on  their  tasks  as  related  to  the  data  set.  The  work  isexpected to use RapidMiner Studio.  Demonstrations and lab exercises will  assist  skilldevelopment.Before attending RM workshops,  students  are  required  to become familiar  with classnotes and all textbook readings (see the topic schedule with chapter references).Activities – No late aivals for the on-campus sessions! Topic1. Learn to use RapidMiner Studio. Preparation2. The workshop facilitator will explain the case in the focus of this assignment.Work in groups of up to 4 (also 1-2-3).M1, M2T1Classification Cross-ValidationOptimisationData PrepStart by formulating a business problem (it may change later).3. Revise classification models (such as k-NN and decision trees), cross-validation, clustering and simple model optimisation.Learn about the problem area and the assignment data. Download your data as a CSV (or JSON if ave) file, explore your data. Select attribute types, nominate them as labels and predictors. Do not modify these ‘raw’ data files outside of the RM environment.4. Learn to parse and represent text data, reduce data dimensionality, performsegmentation analysis, create and evaluate predictive models with attributesderived from text, visualise results.M2T2Text Mining &Sentiment5. Use RM to clean and transform data,  deal  with missing  values,  producesimple  statistics  and  charts,  build  estimation  models  using  multipleegression  and neural  networks.  Learn  how to  create  model  ensembles,such as random forests, boosting, stacking and bootstrapping ensembles.M2T3 & M2T4EstimationNeural NetsEnsembles6. Study the techniques associated with the deployment or analytic processes.Extend your work on neural networks with deep learning systems.M3T1 & M3T2DeploymentDeep Learning7. As  a  team  member,  prepare  an  individual  report  using  the  providedtemplate.  The  report  should  be  in  PDF  format.  Also,  include  all  RMprocesses in RMP format. If you have altered the data, attach the modifieddata to your submission.Report andExecutiveSummary8. By the specified deadline, individually submit two components of your learning portfolio, i.e. LP3 and later LP4 parts of the assignment via CloudDeakin dropbox. With each submission, include your report in PDF, formatted using the provided template plus a ZIP archive of all models, i.e.your RapidMiner scripts (.RMP files) – do not use other file formats!Submission Learning Portfolio1 of 4ObjectivesMethodsPrerequisitesWorkshop ScheduleMIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3This mini case study will be used in all workshops of module 1, i.e. M1T1-M1T4. All amendments, extensions and assumptions should be recorded in the final submission.Australian Wine Importers (AWI) asked you to develop amethod of estimating rating (points) of imported wines basedon their text and structured attributes.AWI provided you with a sample of 130,000 wine tastingesults, which include: Wine “title” (name + vintage); Country, Province and Region; Variety and Winery; Description and Designation; Price (US$). However: Taster name and Points to be excluded. In the future, AWI would like to get the preliminary insight as to the wine quality based on social media reviews. The following questions are of interests to AWI:A) What group of wines the new wine is most similar to, and why / how? and,B) What is the estimated rating of the newly introduced wine to the Australian market? (fractional ratings permitted)AWI wants you to cleanup and explore wine tasting data, develop and evaluate a wine ating estimator, and minimize the estimation eor in the process.In technical terms:Your project objectives form a learning portfolio. The first objective (LP3) is to acquire and explore the available data using clustering and segmentation analysis, visualise and eport relationships in text and structured data, also prepare data for further processing. The second objective (LP4) is to create an estimation system able to answer management questions using all available data. Text mining will be strongly featured in assignment A2. Reports in PDF format and models developed in LP3 and LP4 in ZIP archives are to e submitted via CloudDeakin by their respective deadlines.Data:Data: http:  www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip     Original data source: https:www.kaggle.com/zynicide/wine-reviews  Hints on the process:Formulate a business problem using plain English statements, however, cross-reference them with technical aspects described in the subsequent sections. When describing the problem and its solution keep in mind what can be achieved by using the available data.Note that what you have been asked for and what can be delivered are two different things, e.g. to solve the problem you may need to naow or slightly change the problem scope or the model may provide quality answers only within a specific range of data characteristics, if so then this is what you need to report or recommend to AWI management.2 of 4MiniCase Studyhttp:www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.ziphttps:www.kaggle.com/zynicide/wine-reviewshttp:www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zipMIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3Explore your text and non-text attributes in terms of their clustering and segmentation. Use appropriate visualisations, analyse and interpret them. As the report template provides very limited space, be selective about what you include in the report – each chart and table must have a purpose and a description to advance your argument, use them as evidence!Depending on the model, some attributes may need to be transformed before using them in modelling tasks. You may also have to deal with incoect or missing values. Look at your modelling options, optimise their parameters and compare evaluation results.Check the assessment criteria on the next page to see how you are going to be assessed. Stick to the recommended process. Complete the basics first before moving to the more advanced tasks or any extensions and research tasks.You will submit your work in two learning portfolio parts LP3 and LP4.Each part needs to be lodged via CloudDeakin dropbox before the deadline.You will be allowed to submit your work once only!It is essential that your reports use LP3 and LP4 templates.Follow instructions embedded in the templates!Both reports must fit into a strict page limit imposed by the template. Only pages within the template limit will be reviewed and assessed! Make sure that the problem statement and the executive summary are aimed at non-technical readers, while the remaining parts of the reports aim at a data / business analyst (and not highly technical programmers).Your submission must include the report in PDF format and a ZIP archive of .RMP script files (these can be found in the RM project folder – simply ZIP these files). Submissions not in a PDF and ZIP format will not be open or assessed!There is a strict deadline for each submission. In cases of some documented illness, a special consideration may be granted but must be applied for well ahead of the deadline. In general, requests for special considerations received less than three days before deadline will not be considered!An automatic late penalty of 5% of the available marks per day (up to 5 days) will be applied to all late assignment submissions.Late penalties apply immediately past the deadline – even 1 second!Both parts LP3 and LP4 will be marked together after part LP4 is submitted. Feedback will be provided on both parts together.Team work and collaboration is encouraged but plagiarism will be penalised.Team members can share ideas and help each other in solving technical problems. Seek your team’s feedback on all aspects of your assignment, especially before its submission.However, your assignment needs to be completed individually.Ensure that your assignment is unique, otherwise plagiarism will be assumed!3 of 4Assignment SubmissionMIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3The work will be assessed based on the following criteria. Use RapidMiner for both assignment tasks LP3 and LP4. Other tools can be used for the tasks associated with the esearch section only. Do not start the advanced tasks before meeting the expectations first (or no points will be given). Use submission template for both LP3 and LP4.LP3 ExceptionalRanges: 80–90–100%Meets ExpectationsRanges: 50–65–79%UnacceptableRanges: 0–25–49%5  One page limit  0Polem Identify what decisions need to e drawn and what actions need to be supported.Succinctly state a business problem (or question) and specify what insights need to be generated from data.Not provided or in-comprehensible. 25  One page limit  0Data PepDeal with eors and missing values. Reduce data dimensionality. Provide comprehensive analysis, tabulate your results. Answer the management question (A).Parse text attributes. Then, conduct clustering and segmentation analysis of both structured and text data. In the process, identify elationships in data. Visualise and interpret theobtained results. Annotate all charts (with text and aows) to highlight important insights.Not meeting expectations. Missing RM process files. Over the page limit.Include: Report (use template, in PDF) and RMP files (in ZIP), with explanation how to reproduce all results.LP4 ExceptionalRanges: 80–90–100%Meets ExpectationsRanges: 50–65–79%UnacceptableRanges: 0–25–49%5  One page limit  0Exec Repot Naow down the business problem. Identify decisions and actions that will be supported y the analytic solution. Include a list of used academic refs.Restate /

Abr Writing · Accepted Answer

Load data.
      
       
         
           
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
               
                 
                 
                 
              
               
               
               
               
               
               
            
             Change role to 'regular' for all columns.
          
           
             
             
               
               
               
               
               
            
             
               
                 
                 
                 
                 Define the target column for the predictive model.
              
               
               
               
               
               
               
            
             Should define a target column?
          
           
             
             
               
               
               
               
               
            
             
               
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 Discretize by binning (same range per bin).
              
               
               
               
               
               
               
            
             
               
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 Discretize by frequency (same count per bin).
              
               
               
               
               
               
               
            
             Should discretize numerical target column?
          
           
             
             
               
               
               
               
               
            
             
               
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 Map some nominal target values to new values.

MIS772 Assignment A1 MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 Assignment A2 / Workshops M1-M3: RM his assignment covers all workshops in modules M1-M3. By completing...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment