Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

EXPENSE CLAIM/REPORT 1 MA5810- CAPSTONE PROJECT Total marks: 100 Due date: Wednesday, Week 7 (9th of December), 11:59pm AEST OVERVIEW This assessment involves writing a report that summarises a data...

1 answer below »
EXPENSE CLAIM/REPORT
1

MA5810- CAPSTONE PROJECT
Total marks: 100
Due date: Wednesday, Week 7 (9th of December), 11:59pm AEST

OVERVIEW
This assessment involves writing a report that summarises a data mining related investigation that
you have conducted on data that you have collected yourself. The investigation must involve the
main topics covered in the subject, most noticeably supervised learning and/or unsupervised
learning using R/RStudio. The assessment builds upon the practical knowledge that you should
have acquired through the previous two assignments, however neither the dataset nor the detailed
steps to be ca
ied out will be provided here, you have to make independent choices and decisions.

Submission
You will need to submit the following:
• A PDF file with R code in Appendix. Please submit everything in one PDF file. The assignment
must be presented in 12 font on A4 pages using single line spacing. The assignment must follow the
equired report structure.
• References should be in APA format.
• R code to reproduce your work
• The task cover sheet.
The assignment should not exceed 12-A4 pages. Appendices do not form part of the page limit.

You have up to three attempts to submit your assessment, and only the last submission will be
marked.

A WORD ON PLAGIARISM AND SELF-PLAGIARISM:
Plagiarism is the act of using another’s words, works or ideas from any source as one’s own.
Plagiarism has no place in a University. Student work containing plagiarised material will be subject
to formal university processes.The assessment builds upon the practical knowledge that you should
have acquired through the previous two assignments, however neither the dataset nor the detailed
steps to be ca
ied out will be provided here, you have to make independent choices and decisions.
In case significant portions of your own previous work (e.g., a report for a related subject you did in
this or any other university) is recycled in a way that it could be fully or partially graded twice
(‘double-dipping’), this is considered self-plagiarism and will not be tolerated.
2
Assessment tasks
In this report, you need to demonstrate that: (a) you have grasped important concepts associated
with this subject, most noticeably supervised and unsupervised learning; and (b) you can
communicate your investigation in a formal written manner.
Regarding (a), we expect that your investigation will include at least three machine learning
algorithms from the following topics:
1. LDA, QDA and/or Naive Bayes classification
2. Logistic Regression classifiers and/or KNN for classification
egression
3. Principal Component Analysis (PCA)
4. Cluster Analysis
5. Association Rule Mining and Recommender Systems
Data
You will need to find your own data using good practices. Your dataset cannot be smaller than
1000 observations of five variables, except if the targeted data mining problem to be addressed
elates to spatial- temporal data, in which case less than five dimensions could be allowed.
Preferably, you should use a dataset relevant to your place of work. Do not use data from
textbooks or from R packages. Do not use the same data that have been used in the subject (e.g.
UCI repository). Do not use data for which data mining results and analyses can be found online.
You can use public data, but the data should be appropriate for addressing a relevant data mining
problem, and a solution to a similar problem for the same data should not be available.
Report structure
Please adhere to the strict report structure format. The report will not be assessed if it is not
formatted appropriately.
The report should have the following sections marked clearly:
• Title: In today’s busy world, it is very important to make the most of your title. Make the title
‘eye- catching’, informative and an accurate representation of the contents of the report.
• Abstract: The abstract provides a short sharp overview of the contents in the report and will
e around 200
– 300 words. The abstract has five parts:
i. Introductory statement: background to the study, important issue(s) the report
addresses. (approximately 1-2 sentences)
ii. Purpose of the report: state the objectives (1-2 sentences)
iii. Methodological approach: overview the data and methods (2-3 sentences)
iv. Findings or Achievements: list one or two of the main findings or achievements
from your investigation (1-2 sentences)
3

v. Conclusions and Implications: what conclusions can be drawn from your
investigation? How can the findings/achievements in your report deliver a benefit
to people, things, systems or processes? (1-2 sentences).
• Introduction: The introduction sets the scene for the investigative efforts. It provides
motivation for the work and relevant background information and references that will enable
the reader to put in context the key objectives and achievements in your report. Address the
important issues that have motivated your investigation. At the end of the introduction clearly
state the objectives of the report. Do not put any results from your investigation in the
introduction. Do not discuss details about the data and methods in this section. Do not discuss
your conclusions or key findings in the introduction.
• Data: This section should provide details about how the data was obtained and what the data
epresent. You should include information such as (but not limited to)
i. What the source of the data is
ii. How the data was originally collected (e.g., from an experiment or observational
study)
iii. The sample size
iv. The number and types of variables
v. Any known interventions or pre-processing that precede the ones described in your
eport
vi. Any other information that is relevant to the understanding and assessment of
your work
eport.
• Methods: This section should discuss in depth the data mining methods that were used to
process and to analyse the data, as well as the software version used to generate the results
and report. To cite R-Studio type RStudio.Version() from the command line. The methods
should be appropriate to ensure that the objectives of the paper are met.
• Results and Discussion: This section presents and discusses the results. The discussion centres
on the outputs from the data mining procedures that you have performed. For example, what
are the main outcomes? Why are they useful and what for? How are they interesting and
why?, and so on. In particular, how do the results align with the goals set in the introduction?
What are the main achievements and their implications?
• Conclusions: Final remarks about the key achievements of the investigations and what makes
them ‘interesting’ or ‘useful’, right now or for future work. Achievements or findings should
e contrasted with the original objectives or hypotheses of the project. Make sure that you
mention any limitations of your work here. Limit the conclusions to no more than two or three
paragraphs.
• References. List the sources your investigation has drawn from. Note that all references
should be refe
ed to in the text.
• Appendices: Add R code and any supporting materials that might be useful to help assess your
work.
4



RUBRIC TEMPLATE
Please adhere to the report structure requirements. The report will not be assessed if it is not formatted appropriately.
Dimension High distinction Pass Fail
R code and
References
10%
Code submitted and attached to Appendix.

Code works co
ectly, meets the specifications,
produces the co
ect results and displays them
co
ectly.


Code is well organised and very easy to follow.
Code always very well commented so the purpose
of each block of code readily understood and
what question part it co
esponds to. Variable
names give the purpose of the variable.

All references have been listed, in the right
format, and refe
ed to in the appropriate
places in the body of the text and listed at the
end of the report. At least 4 references have
een provided.

Code only provided in answer document but looks
co
ect.

Code often exhibits inco
ect behaviour. Significant
details of specification are violated.

The code is readable only by someone who already
knows what it is supposed to be doing. Comments
not sufficient to see what the code is doing.
Significant lack of comments makes it difficult to
understand code.

Some references have been listed and refe
ed to in
the appropriate places in the body of the text and
listed at the end of the report. At least 2 references
have been provided.
Code not submitted

Code not provided in answer document. Code
produces inco
ect results, does not compile,
or significant e
ors occur.

Code is poorly organised and very difficult to
ead. Code has no comments.

No references.
Abstract and
Introduction
(10%)
Clearly addresses the five parts of the
abstract so that the reader has a clear
overview of the reports.

Position and exceptions, if any, are
clearly stated. Organisation of the
argument is completely and clearly outlined and
implemented.
Partially addresses the five parts of the abstract
and or addresses all five parts but
the writing is not clear in places.

Position is clearly stated. Organisation of
argument is clear in parts or only partially
described and mostly implemented.
Does not provide an overview the report, or
the writing is poor overall and mostly unclear.

Position is vague. Organisation of argument is
missing, vague or not consistently
maintained.
5

Data
(10%)
Data are suitable, the report explains how
the data were obtained.

Provides a detailed, accurate description of
the data and data methods to be employed
within the project.


Exploratory data analysis and verification are
detailed and provides critical insight with
clear overt links to model developments.
Data insights are concisely presented and
visualised.

Data are suitable, the report explains how the
data were obtained.

Answered 4 days After Aug 14, 2021

Solution

Neha answered on Aug 18 2021
138 Votes
89621 - R and report/code.
customer_data=read.csv("/home/dataflai
Mall_Customers.csv")
str(customer_data)
names(customer_data)
head(customer_data)
summary(customer_data$Age)
sd(customer_data$Age)
summary(customer_data$Annual.Income..k..)
sd(customer_data$Annual.Income..k..)
summary(customer_data$Age)
sd(customer_data$Spending.Score..1.100.)
a=table(customer_data$Gender)
arplot(a,main="Using BarPlot to display Gender Comparision",
ylab="Count",
xlab="Gender",
col=rainbow(2),
legend=rownames(a))
pct=round(a/sum(a)*100)
lbs=paste(c("Female","Male")," ",pct,"%",sep=" ")
li
ary(plotrix)
pie3D(a,labels=lbs,
main="Pie Chart Depicting Ratio of Female and Male")
summary(customer_data$Age)
hist(customer_data$Age,
col="blue",
main="Histogram to Show Count of Age Class",
xlab="Age Class",
ylab="Frequency",
labels=TRUE)
oxplot(customer_data$Age,
col="ff0066",
main="Boxplot for Descriptive Analysis of Age")
summary(customer_data$Annual.Income..k..)
hist(customer_data$Annual.Income..k..,
col="#660033",
main="Histogram for Annual Income",
xlab="Annual Income Class",
ylab="Frequency",
labels=TRUE)
plot(density(customer_data$Annual.Income..k..),
col="yellow",
main="Density Plot for Annual Income",
xlab="Annual Income Class",
ylab="Density")
polygon(density(customer_data$Annual.Income..k..),
col="#ccff66")
summary(customer_data$Spending.Score..1.100.)
Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 34.75 50.00 50.20 73.00 99.00
oxplot(customer_data$Spending.Score..1.100.,
horizontal=TRUE,
col="#990000",
main="BoxPlot for Descriptive Analysis of Spending Score")
hist(customer_data$Spending.Score..1.100.,
main="HistoGram for Spending Score",
xlab="Spending Score Class",
ylab="Frequency",
col="#6600cc",
labels=TRUE)
li
ary(pu
)
set.seed(123)
# function to calculate total intra-cluster sum of square
iss <- function(k) {
kmeans(customer_data[,3:5],k,iter.max=100,nstart=100,algorithm="Lloyd" )$tot.withinss
}
k.values <- 1:10
iss_values <- map_dbl(k.values, iss)
plot(k.values, iss_values,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total intra-clusters sum of squares")
li
ary(cluster)
li
ary(gridExtra)
li
ary(grid)
k2<-kmeans(customer_data[,3:5],2,iter.max=100,nstart=50,algorithm="Lloyd")
s2<-plot(silhouette(k2$cluster,dist(customer_data[,3:5],"euclidean")))
k3<-kmeans(customer_data[,3:5],3,iter.max=100,nstart=50,algorithm="Lloyd")
s3<-plot(silhouette(k3$cluster,dist(customer_data[,3:5],"euclidean")))
k4<-kmeans(customer_data[,3:5],4,iter.max=100,nstart=50,algorithm="Lloyd")
s4<-plot(silhouette(k4$cluster,dist(customer_data[,3:5],"euclidean")))
k5<-kmeans(customer_data[,3:5],5,iter.max=100,nstart=50,algorithm="Lloyd")
s5<-plot(silhouette(k5$cluster,dist(customer_data[,3:5],"euclidean")))
k6<-kmeans(customer_data[,3:5],6,iter.max=100,nstart=50,algorithm="Lloyd")
s6<-plot(silhouette(k6$cluster,dist(customer_data[,3:5],"euclidean")))
k7<-kmeans(customer_data[,3:5],7,iter.max=100,nstart=50,algorithm="Lloyd")
s7<-plot(silhouette(k7$cluster,dist(customer_data[,3:5],"euclidean")))
k8<-kmeans(customer_data[,3:5],8,iter.max=100,nstart=50,algorithm="Lloyd")
s8<-plot(silhouette(k8$cluster,dist(customer_data[,3:5],"euclidean")))
k9<-kmeans(customer_data[,3:5],9,iter.max=100,nstart=50,algorithm="Lloyd")
s9<-plot(silhouette(k9$cluster,dist(customer_data[,3:5],"euclidean")))
k10<-kmeans(customer_data[,3:5],10,iter.max=100,nstart=50,algorithm="Lloyd")
s10<-plot(silhouette(k10$cluster,dist(customer_data[,3:5],"euclidean")))
li
ary(NbClust)
li
ary(factoextra)
fviz_nbclust(customer_data[,3:5], kmeans, method = "silhouette")
set.seed(125)
stat_gap <- clusGap(customer_data[,3:5], FUN = kmeans, nstart = 25,
K.max = 10, B = 50)
fviz_gap_stat(stat_gap)
k6<-kmeans(customer_data[,3:5],6,iter.max=100,nstart=50,algorithm="Lloyd")
k6
pcclust=prcomp(customer_data[,3:5],scale=FALSE) #principal component analysis
summary(pcclust)
pcclust$rotation[,1:2]
set.seed(1)
ggplot(customer_data, aes(x =Annual.Income..k.., y = Spending.Score..1.100.)) +
geom_point(stat = "identity", aes(color = as.factor(k6$cluster))) +
scale_color_discrete(name=" ",

eaks=c("1", "2", "3", "4", "5","6"),
labels=c("Cluster 1", "Cluster 2", "Cluster 3", "Cluster 4", "Cluster 5","Cluster 6")) +
ggtitle("Segments of Mall Customers", subtitle = "Using K-means Clustering")
ggplot(customer_data, aes(x =Spending.Score..1.100., y =Age)) +
geom_point(stat = "identity", aes(color = as.factor(k6$cluster))) +
scale_color_discrete(name=" ",

eaks=c("1", "2", "3", "4", "5","6"),
labels=c("Cluster 1", "Cluster 2", "Cluster 3", "Cluster 4", "Cluster 5","Cluster 6")) +
ggtitle("Segments of Mall Customers", subtitle = "Using K-means Clustering")
kCols=function(vec){cols=rainbow (length (unique (vec)))
eturn (cols[as.numeric(as.factor(vec))])}
digCluste
-k6$cluster; dignm<-as.character(digCluster); # K-means clusters
plot(pcclust$x[,1:2], col =kCols(digCluster),pch =19,xlab ="K-means",ylab="classes")
legend("bottomleft",unique(dignm),fill=unique(kCols(digCluster)))
89621 - R and report
eport.docx
Student Name
Student Numbe
Subject
Title
Contents
Abstract    3
Introduction    3
Data    4
Methods    5
K-means Algorithm    6
Determining Optimal Clusters    6
Results and discussion    6
Conclusions    16
References    17
Appendices    17
Abstract
In this report we will discuss about data science project. With the help of this report, they will perform the most essential application in the machine learning which is customer segmentation. The customer segmentation is implemented in this project with the help of R language. The customer segmentation is the best method to find best customers in the system. The customer segmentation is the most important application for unsupervised learning. We can use clustering techniques for the companies who wants to identify several segments off the customers and help them to target the potential user base. In this project of machine learning we will use K-Means algorithm. This algorithm is the most suitable algorithm for clustering the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here