[Title of your report]
Introduction
The introduction should be able to be understood by a layperson and should include the purpose of the analysis.
As a guideline, one paragraph will be sufficient.
[Delete instruction text before submitting]
[Type your introduction here]
Motivation and Methodology
Describe the motivation for the analysis methods that you have used. This section must answer the questions what you did, why you did that and how you did it.
As a guideline, maximum two paragraphs will be sufficient.
[Delete instruction text before submitting]
[Type your description of methods here]
Results & Discussion
Summarise the main results of your analyses in questions 2 and 3. You may use subsections, tables etc. as you see fit. Present and discuss results in a clear and simple way:
Present findings of statistical analyses in a logical sequence.
Do not include code or dumps of R output. Results should either be incorporated into sentences or formatted appropriately to be neatly presented.
Interpret your findings by discussing their practical significance.
Discuss shortcomings, if any.
As a guideline, maximum three paragraphs will be sufficient.
[Delete instruction text before submitting]
[Type your results and discussion here]
Recommendations & Conclusions
Type your recommendations and conclusions here
What do you conclude overall about the analysis?
As a guideline, one paragraph will be sufficient. Do not introduce any new information in this section, and do not simply repeat statements made elsewhere in your report!
[Delete instruction text before submitting]
[Type your recommendations and conclusions here]
1
MATH 1081 UO Mathematical Methods
for Data Analytics 2
Assessment 2.1 : Project Part A
Instructions:
• Structure of the assessment: This assessment is worth 25% of your final
grade and is due no later than 12 pm on Monday, Week 7. This assessment
consists of 3 main questions to answer and a report writing. Your submission will
e marked out of 100.
• Use of R: This project is a guided case study. It is important that you follow any
instructions or guidance in the questions, such as “Use R” where required. You
must provide your R codes to get full marks wherever you use R to answe
the questions. Upload your R script and screenshot the R codes in your answe
sheet.
• Save your work: Save your answer sheet as a pdf named “your student
ID Assessment 2.1 MATH1081.pdf”.
• Show your work: Show all necessary steps so that the reader can follow you
solution procedure.
• Submit your work: Create a folder with
1. your answer sheet
2. your R script and
3. the final dataset you used for the analysis in “.csv” format.
Name your folder with your student ID and upload it as a zip file.
• Acknowledgement of work: When submitting online, you acknowledge that
the submitted assignment is your own work unless otherwise stated.
1
• Academic integrity: The University’s policy on academic misconduct will be
strictly applied. Here are some tips to avoid academic misconduct:
– Do not copy from any printed or electronic source or from any person.
– Write your own solutions. You may discuss your work with others, but
you must write up your solutions yourself. You are not allowed to use some-
one else’s written work when writing up your submission.
– Do not give inappropriate help. Giving inappropriate help is just as
serious as receiving it and will have the same consequences. Do not show
your completed exercise to others. Dispose of drafts so that no one can access
them.
– Acknowledge help and joint work. If you receive any help from anothe
source (for example, students, tutors, friends, internet), you must make a
note of it on your submission.
• Late submission: Any late submission will attract a penalty of 5 marks avail-
able per day for five days. The cut-off time is 12 pm each day. After five
days from the assessment due date, no submissions will be marked, and zero
marks will be granted.
2
Assessment Task Overview
Photo by Luke van Zyl on Unsplash
This assessment is based on the data in Melbourne housing.csv file. It con-
tains residential building data, including construction cost, sales prices, some project
variables, and some economic variables co
esponding to real estate in Melbourne, Aus-
tralia. The objective is to understand, analyse and develop a model to predict the sales
price (Price). A
ief description of variables is provided below.
Data dictionary
Assessment Task Details
You have to complete this assessment in two sections.
1. A list of questions to answer that comprising of 70% of the total grade (70 marks).
Write your answers clearly in a well-organised manner with accurate notations.
Label the questions and sub-questions.
2. A report summarising your analysis in Section 1 that comprising of 30% of the
total grade (30 marks). A guide for the project report is provided in learnonline.
Section 1: Questions
1. The data is not always cleaned and presented in a working manner. There are
some unnecessary columns and variables which do not have full completed entries.
In addition, you might have e
ors in this dataset, and you have to fix them before
you start analysing. You can do data cleansing in R or Excel.
(a). Choose & filter a single house ‘Type’. Use this for the remainder of the
assignment. Provide a dot point summary of co
ections made to the dataset,
3
https:
unsplash.com
Variable Description
Subu
Subu
Address Street address
Rooms number of Rooms
Type Type of Housing
Price Actual sales price (local cu
ency)
Method S - property sold; SP - property sold prior; PI - property passed in;
PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid;
VB - vendor bid; W - withdrawn prior to auction;
SA - sold after auction; SS - sold after auction price not disclosed; N/A - price NA.
Type
- bedroom(s); h - house,cottage,villa, semi,te
ace; u - unit, duplex;
t - townhouse; dev site - development site; o res - other residential.
SellerG Real Estate Agent
Date Date sold
Distance Distance from CBD in Kilometres
Regionname General Region (West, North West, North, North east . . . etc)
Propertycount Number of properties that exist in the subu
.
Bedroom2 Scraped # of Bedrooms (from different source)
Bathroom Number of Bathrooms
Car Number of carspots
Landsize Land Size in Metres
BuildingArea Building Size in Metres
YearBuilt Year the house was built
CouncilArea Governing council for the area
Lattitude: Self explanatory
Longtitude Self explanatory
Table 1: Data dictionary Melbourne Housing.csv
and create a subset dataset with the continuous variables and ‘Postcode”.
Hint: Use na.omit function For full marks, provide a screenshot of the first
30 row entries of the cleaned dataset in R. [7 marks]
(b). Find the covariance matrix and include it as a screenshot. Exclude the
esponse variable and ‘Number of Rooms’ & ‘Postcode’ when finding the
covariance matrix. [4 marks]
(c). Find the eigenvalues and eigenvectors of the covariance matrix in part (b).
Provide the R output and code as your solution. [5 marks]
(d). Provide the diagonalized form of the covariance matrix in part (b), using the
esults in part (c). [4 marks]
[20 marks]
2. Conduct a Principal Component analysis (PCA) and develop a model that predict
4
the response variable, sales price of a property (Price). This is an open question
and you need to write up your results. Answer the following guided questions to
finish this task.
(e). Use R to compute the co
elation matrix between the variables. Present
a scatterplot between any two strongly co
elated variables. Provide an
interpretation of the observed relationship with these variables incorporating
the co
elation coefficient for full marks. [3 marks]
(f). Split the subset of Melbourne housing dataset into a training and testing
datasets where 80% of dataset is in training set. Hint: Use sample n function
[4 marks]
(g). Conduct PCA analysis on the training dataset, and present your findings.
Discuss the principal components and explained variance. How many (Prin-
cipal Components) PCs are you going to keep? [4 marks]
(h). Provide a visualization with ggbiplot of the first two PCs. [4 marks]
(i). Form a dataset with your training set in terms of the PC components and
objective variable. [3 marks]
(j). Use lm function in R to develop a linear regression model to predict the
esponse variable of sales price. Present your R output and summary of
model. Are the coefficients significant? [5 marks]
(k). Is the model in part (j) is a good model? why? [2 marks]
(l). Run your model in part (j) for the testing dataset and compare the output
to the original sales price in the testing dataset. Is the model tend to unde
or overestimate sales price? [3 marks]
(m). Run the model and predict the value of sales price for postcode 3000. Com-
ment on the prediction. [2
marks]
[30 marks]
3. Assume that the ‘project-based pricing strategy’ is used for pricing, and it has
een determined that the following function C(x, y) is proportional to the sales
price and economic cost.
(n). Run the gradient descent algorithm in R to find the minimum of the cost
function C(x, y) and the values for x and y that produce the minimum cost
function starting with (x, y) = (121, 70) given the learning rate is 0.01, and
the convergence threshold is 0.005.
C(x, y) = 200 − (x− 100)y exp
(
−
(
(x− 120)2 +
( y
100
)2))
5
Provide your R code, output and minimum of the cost function C(x, y)
and the values for x and y that produce the minimum cost function as the
answer. [12 marks]
(o). Assume that the variables x and y in the function C(x, y) are the variables
Landsize (x) and BuildingArea (y). Predict the sales price from the obtained
values for Landize (x) and BuildingArea (y) in part (n), along with the
averages of any other needed variables, in your model developed in Question
2 part (j). [7 marks]
(p). The sales price of which postcode is the closest to the Sales price you obtained
in part (o)?
Hint: Use which function in R to get the sales price within ±x where x is a
user-defined margin from the value obtained in part (o). [3 marks]
[20 marks]
Section 2: Report
This is a written section to present your results in a report form. This includes
the following components:
• Introduction [5 marks]
• Motivation and Methodology [5 marks]
• Results and presentation of main results [10 marks]
• Discussion and Conclusions [10 marks]
[30 marks]
6