EXPENSE CLAIM/REPORT1 MA5810- CAPSTONE PROJECT Total marks: 100 Due date: Wednesday, Week 7 (9th of...

Question

EXPENSE CLAIM/REPORT1  MA5810- CAPSTONE PROJECT Total marks: 100 Due date: Wednesday, Week 7 (9th of December), 11:59pm AEST   OVERVIEW This assessment involves writing a report that summarises a data mining related investigation that you have conducted on data that you have collected yourself. The investigation must involve the main topics covered in the subject, most noticeably supervised learning and/or unsupervised learning using R/RStudio. The assessment builds upon the practical knowledge that you should have acquired through the previous two assignments, however neither the dataset nor the detailed steps to be caied out will be provided here, you have to make independent choices and decisions.  Submission You will need to submit the following: • A PDF file with R code in Appendix. Please submit everything in one PDF file. The assignment must be presented in 12 font on A4 pages using single line spacing. The assignment must follow the equired report structure. • References should be in APA format. • R code to reproduce your work • The task cover sheet. The assignment should not exceed 12-A4 pages. Appendices do not form part of the page limit.  You have up to three attempts to submit your assessment, and only the last submission will be marked.  A WORD ON PLAGIARISM AND SELF-PLAGIARISM: Plagiarism is the act of using another’s words, works or ideas from any source as one’s own. Plagiarism has no place in a University. Student work containing plagiarised material will be subject to formal university processes.The assessment builds upon the practical knowledge that you should have acquired through the previous two assignments, however neither the dataset nor the detailed steps to be caied out will be provided here, you have to make independent choices and decisions. In case significant portions of your own previous work (e.g., a report for a related subject you did in this or any other university) is recycled in a way that it could be fully or partially graded twice (‘double-dipping’), this is considered self-plagiarism and will not be tolerated.2 Assessment tasks   In this report, you need to demonstrate that: (a) you have grasped important concepts associated   with this subject, most noticeably supervised and unsupervised learning; and (b) you can communicate your investigation in a formal written manner. Regarding (a), we expect that your investigation will include at least three machine learning algorithms from the following topics: 1. LDA, QDA and/or Naive Bayes classification 2. Logistic Regression classifiers and/or KNN for classificationegression 3. Principal Component Analysis (PCA) 4. Cluster Analysis 5. Association Rule Mining and Recommender Systems Data You will need to find your own data using good practices. Your dataset cannot be smaller than 1000 observations of five variables, except if the targeted data mining problem to be addressed elates to spatial- temporal data, in which case less than five dimensions could be allowed. Preferably, you should use a dataset relevant to your place of work. Do not use data from textbooks or from R packages. Do not use the same data that have been used in the subject (e.g. UCI repository). Do not use data for which data mining results and analyses can be found online. You can use public data, but the data should be appropriate for addressing a relevant data mining problem, and a solution to a similar problem for the same data should not be available. Report structure Please adhere to the strict report structure format.  The report will not be assessed if it is not formatted appropriately.  The report should have the following sections marked clearly: • Title: In today’s busy world, it is very important to make the most of your title. Make the title ‘eye- catching’, informative and an accurate representation of the contents of the report. • Abstract: The abstract provides a short sharp overview of the contents in the report and will e around 200 – 300 words. The abstract has five parts: i. Introductory statement: background to the study, important issue(s) the report addresses. (approximately 1-2 sentences) ii. Purpose of the report: state the objectives (1-2 sentences) iii. Methodological approach: overview the data and methods (2-3 sentences) iv. Findings or Achievements: list one or two of the main findings or achievements from your investigation (1-2 sentences) 3  v. Conclusions and Implications: what conclusions can be drawn from your investigation? How can the findings/achievements in your report deliver a benefit to people, things, systems or processes? (1-2 sentences). • Introduction: The introduction sets the scene for the investigative efforts. It provides motivation for the work and relevant background information and references that will enable the reader to put in context the key objectives and achievements in your report. Address the important issues that have motivated your investigation. At the end of the introduction clearly state the objectives of the report. Do not put any results from your investigation in the introduction. Do not discuss details about the data and methods in this section. Do not discuss your conclusions or key findings in the introduction. • Data: This section should provide details about how the data was obtained and what the data epresent. You should include information such as (but not limited to) i. What the source of the data is ii. How the data was originally collected (e.g., from an experiment or observational study) iii. The sample size iv. The number and types of variables v. Any known interventions or pre-processing that precede the ones described in your eport vi. Any other information that is relevant to the understanding and assessment of your workeport. • Methods: This section should discuss in depth the data mining methods that were used to process and to analyse the data, as well as the software version used to generate the results and report. To cite R-Studio type RStudio.Version() from the command line. The methods should be appropriate to ensure that the objectives of the paper are met.   • Results and Discussion: This section presents and discusses the results. The discussion centres on the outputs from the data mining procedures that you have performed. For example, what are the main outcomes? Why are they useful and what for? How are they interesting and why?, and so on. In particular, how do the results align with the goals set in the introduction? What are the main achievements and their implications? • Conclusions: Final remarks about the key achievements of the investigations and what makes them ‘interesting’ or ‘useful’, right now or for future work. Achievements or findings should e contrasted with the original objectives or hypotheses of the project. Make sure that you mention any limitations of your work here. Limit the conclusions to no more than two or three paragraphs. • References. List the sources your investigation has drawn from. Note that all references should be refeed to in the text. • Appendices: Add R code and any supporting materials that might be useful to help assess your work.4    RUBRIC TEMPLATE Please adhere to the report structure requirements. The report will not be assessed if it is not formatted appropriately.  Dimension High distinction  Pass Fail R code and References 10% Code submitted and attached to Appendix.  Code works coectly, meets the specifications, produces the coect results and displays them coectly.   Code is well organised and very easy to follow.  Code always very well commented so the purpose of each block of code readily understood and what question part it coesponds to. Variable names give the purpose of the variable.  All references have been listed, in the right format, and refeed to in the appropriate places in the body of the text and listed at the end of the report. At least 4 references have een provided.   Code only provided in answer document but looks coect.  Code often exhibits incoect behaviour. Significant details of specification are violated.  The code is readable only by someone who already knows what it is supposed to be doing. Comments not sufficient to see what the code is doing. Significant lack of comments makes it difficult to understand code.  Some references have been listed and refeed to in the appropriate places in the body of the text and listed at the end of the report. At least 2 references have been provided.  Code not submitted  Code not provided in answer document. Code produces incoect results, does not compile, or significant eors occur.  Code is poorly organised and very difficult to ead. Code has no comments.  No references. Abstract and Introduction (10%) Clearly addresses the five parts of the abstract so that the reader has a clear overview of the reports.  Position and exceptions, if any, are clearly stated. Organisation of the argument is completely and clearly outlined and implemented. Partially addresses the five parts of the abstract and or addresses all five parts but the writing is not clear in places.  Position is clearly stated. Organisation of argument is clear in parts or only partially described and mostly implemented.   Does not provide an overview the report, or the writing is poor overall and mostly unclear.  Position is vague. Organisation of argument is missing, vague or not consistently maintained. 5  Data (10%) Data are suitable, the report explains how the data were obtained.   Provides a detailed, accurate description of the data and data methods to be employed within the project.   Exploratory data analysis and verification are detailed and provides critical insight with clear overt links to model developments.  Data insights are concisely presented and visualised.   Data are suitable, the report explains how the data were obtained.

Neha · Accepted Answer

89621 - R and report/code.r
customer_data=read.csv("/home/dataflair/Mall_Customers.csv")
str(customer_data)
names(customer_data)
head(customer_data)
summary(customer_data$Age)
sd(customer_data$Age)
summary(customer_data$Annual.Income..k..)
sd(customer_data$Annual.Income..k..)
summary(customer_data$Age)
sd(customer_data$Spending.Score..1.100.)
a=table(customer_data$Gender)
barplot(a,main="Using BarPlot to display Gender Comparision",
       ylab="Count",
       xlab="Gender",
       col=rainbow(2),
       legend=rownames(a))
pct=round(a/sum(a)*100)
lbs=paste(c("Female","Male"),

EXPENSE CLAIM/REPORT 1 MA5810- CAPSTONE PROJECT Total marks: 100 Due date: Wednesday, Week 7 (9th of December), 11:59pm AEST OVERVIEW This assessment involves writing a report that summarises a data...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment