[Title of your report]IntroductionProvides clear and concise context for the report, introducing the...

Question

[Title of your report]IntroductionProvides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient. [Delete instruction text before submitting][Type your introduction here]Motivation and MethodologyDescribe the motivation for the analysis methods and tools that you have used in each section. This section must answer the questions what you did, why you did that and how you did it.As a guideline, maximum two paragraphs will be sufficient. [Delete instruction text before submitting][Type your description of methods here]Results & Discussion Summarise the main results of your analyses in each section I to IV. You may use subsections, tables etc. as you see fit. Present and discuss results in a clear and simple way:Present findings of statistical analyses in a logical sequence. Do not include code or dumps of R output. Results should either be incorporated into sentences or formatted appropriately to be neatly presented.Interpret your findings by discussing their practical significance.Discuss shortcomings, if any.As a guideline, maximum three paragraphs will be sufficient. [Delete instruction text before submitting][Type your results and discussion here]Recommendations & ConclusionsBased on your analysis, provide a ief overall discussion summarising/interpreting the results of the analyses you performed and final conclusions based on the hypothesis tested. As a guideline, one paragraph will be sufficient. Do not introduce any new information in this section, and do not simply repeat statements made elsewhere in your report![Delete instruction text before submitting][Type your recommendations and conclusions here]1 [Title of your report]IntroductionProvides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient. [Delete instruction text before submitting][Type your introduction here]Motivation and MethodologyDescribe the motivation for the analysis methods and tools that you have used in each section. This section must answer the questions what you did, why you did that and how you did it.As a guideline, maximum two paragraphs will be sufficient. [Delete instruction text before submitting][Type your description of methods here]Results & Discussion Summarise the main results of your analyses in each section I to IV. You may use subsections, tables etc. as you see fit. Present and discuss results in a clear and simple way:Present findings of statistical analyses in a logical sequence. Do not include code or dumps of R output. Results should either be incorporated into sentences or formatted appropriately to be neatly presented.Interpret your findings by discussing their practical significance.Discuss shortcomings, if any.As a guideline, maximum three paragraphs will be sufficient. [Delete instruction text before submitting][Type your results and discussion here]Recommendations & ConclusionsBased on your analysis, provide a ief overall discussion summarising/interpreting the results of the analyses you performed and final conclusions based on the hypothesis tested. As a guideline, one paragraph will be sufficient. Do not introduce any new information in this section, and do not simply repeat statements made elsewhere in your report![Delete instruction text before submitting][Type your recommendations and conclusions here]1  MATH 1081 UO Mathematical Methodsfor Data Analytics 2Assessment 2.2 : Project Part BInstructions:• Structure of the assessment: This assessment is worth 35% of your final gradeand is due no later than 5 pm on Friday, Week 10. This assessment consistsof 20 questions under 4 sections to answer and a report writing. Your submissionwill be marked out of 100.• Use of R: This project is a guided case study. It is important that you follow anyinstructions or guidance in the questions, such as “Use R” where required. Youmust provide your R codes to get full marks wherever you use R to answethe questions. Upload your R script and screenshot the R codes and outputs inyour answer sheet.• Save your work: Save your answer sheet as a pdf named “your studentID Assessment 2.2 MATH1081.pdf”.• Show your work: Show all necessary steps so that the reader can follow yousolution procedure.• Submit your work: Create a folder with1. your answer sheet2. your R script and3. the final dataset you used for the analysis in “.csv” format.Name your folder with your student ID and upload it as a zip file.• Acknowledgement of work: When submitting online, you acknowledge thatthe submitted assignment is your own work unless otherwise stated.1• Academic integrity: The University’s policy on academic misconduct will bestrictly applied. Here are some tips to avoid academic misconduct:– Do not copy from any printed or electronic source or from any person.– Write your own solutions. You may discuss your work with others, butyou must write up your solutions yourself. You are not allowed to use some-one else’s written work when writing up your submission.– Do not give inappropriate help. Giving inappropriate help is just asserious as receiving it and will have the same consequences. Do not showyour completed exercise to others. Dispose of drafts so that no one can accessthem.– Acknowledge help and joint work. If you receive any help from anothesource (for example, students, tutors, friends, internet), you must make anote of it on your submission.• Late submission: Any late submission will attract a penalty of 5 marks avail-able per day for five days. The cut-off time is 5 pm each day. After fivedays from the assessment due date, no submissions will be marked, and zeromarks will be granted.2Assessment Task OverviewPhoto by Luke van Zyl on UnsplashThis assessment is based on the data in Melbourne housing.csv file. It con-tains residential building data, including construction cost, sales prices, some projectvariables, and some economic variables coesponding to real estate in Melbourne, Aus-tralia. The objective is to understand, analyse and develop a model to predict the salesprice (Price). A ief description of variables is provided below.3https:unsplash.comData dictionaryVariable DescriptionSubu SubuAddress Street addressRooms number of RoomsType Type of HousingPrice Actual sales price (local cuency)Method S - property sold; SP - property sold prior; PI - property passed in;PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid;VB - vendor bid; W - withdrawn prior to auction;SA - sold after auction; SS - sold after auction price not disclosed; N/A - price NA.Type  - bedroom(s); h - house,cottage,villa, semi,teace; u - unit, duplex;t - townhouse; dev site - development site; o res - other residential.SellerG Real Estate AgentDate Date soldDistance Distance from CBD in KilometresRegionname General Region (West, North West, North, North east . . . etc)Propertycount Number of properties that exist in the subu.Bedroom2 Scraped # of Bedrooms (from different source)Bathroom Number of BathroomsCar Number of carspotsLandsize Land Size in MetresBuildingArea Building Size in MetresYearBuilt Year the house was builtCouncilArea Governing council for the areaLattitude: Self explanatoryLongtitude Self explanatoryTable 1: Data dictionary Melbourne Housing.csvAssessment Task DetailsYou have to complete this assessment in two sections.1. A list of questions to answer that comprising of 72% of the total grade (72 marks).Write your answers clearly in a well-organised manner with accurate notations.Label the questions and sub-questions.2. A report summarising your analysis in Section 1 that comprising of 28% of thetotal grade (28 marks). A guide for the project report is provided in learnonline.4Section 1: Questions[I] Descriptive Statistics & Exploratory Analysis:The data is not always cleaned and presented in a working manner. There are someunnecessary columns and variables which do not have full completed entries. In addi-tion, you might have eors in this dataset, and you have to fix them before you startanalysing. You can do data cleansing in R or Excel.(a). Choose & filter a single house ‘Type’. Use this for the remainder of the assign-ment as completed in Project Part A. Create a subset dataset of size at least250 with the continuous variables and ‘Postcode” and ‘Year= 2018’. Hint: Usena.omit function. For full marks, provide a screenshot of the first 30 row entriesof the cleaned dataset in R. [2 marks](b). Use R to produce histograms of all the possible continuous variables. [4 marks](c). Use R to produce descriptive statistics for all the variables in part (a).[4 marks](d). Use R to produce boxplots describing the continuous variables side by side. Thisshould be a picture of one plot. [2 marks](e). Using your outputs from (a) to (d), comment on the shape of the distribution foeach variable. In particular, iefly describe in a table form:• Whether there is one peak, or multiple peaks, in the distribution;• The shape of the distribution (skewed or symmetric);• Whether there appear to be any outliers. [5 marks]Example table layout:VariableNumber of peaksin the distributionOne/multipleShape of the distributionLeft-skewed/Right-SkewedSymmetricOutliers presentYes/No(f). Which central tendency (mean/median) and dispersion (standard deviation/intequartile range) measures are the most appropriate to summarise the variablesnumerically? Justify your choice of measures. Provide your answers in a tableform. For full marks, provide the general interpretation for the listed summarymeasures. [4 marks]Example table layout:5VariableMeasure ofCentral tendencymean/medianMeasure ofdispersionSD/IQRJustification(g). Use R to test the variables for Normality. Briefly describe whether the data fol-lows a Normal distribution. Tabulate your answer. [4 marks]Example table layout:Variable P-valueReject H0Yes/NoNormally distributedYes/No[25 marks][II] Normal Distribution & Central Limit Theorem:(h). Use R to calculate the probability that the average house (unit) Price will bemore than $1,000,000 ($600,000) using the provided data. For full marks, clearlystate the distribution of average sales price and the coect probability statement.Interpret your final answer. [5 marks](i). Use R to calculate the probability that the average house (unit) Price will beless than $1,000,000 ($600,000). clearly state the coect probability statement.Provide an interpretation to your final answer. [2 marks](j). What is the cut off for the probability of an average Price higher than the cutoffwould be 5%? For full marks, provide a coect probability statement.[3 marks](l). Using R, produce a random sample of size 30 for variable BuildingArea by ran-domly selecting 30 values without replacement from the BuildingArea variable inthe provided data. Repeat the same for Landsize variable. For full marks, providea screenshot of your samples in a table format.Hint: Use data.frame() to tabulate samples [4 marks](k). Use R to produce the descriptive statistics for each sample in part (l), and storethe information in another table, please ensure you state the mean and standarddeviation of each sample. [2 marks]6(m). Determine the sampling distribution of means for BuildingArea and Landsize andstate the parameters based off your samples in part (l). Justify your answer,quoting any theorems you used. [3 marks](n). Calculate the probability that the average Landsize is greater than 650 based onyour sampling distribution of the means from part (m). For full marks, providea coect probability statement and interpret the final answer. [3 marks][22 marks][III] Estimating & determining the population mean:(o). Manually construct 95% confidence interval for the population mean for Buildin-gArea and Landsize based on the sampled data in part (l). Use R to verify theesults. Interpret your confidence interval.Hint: t29,0.025 = 2.045 [3 marks](p). Repeat the previous question for a 99% confidence interval for the populationmean of the same variables based on the sampled data in part (l).Hint: t29,0.005 = 2.756 [3 marks](q). Compare and contrast the 99% confidence intervals for the two variables in part(p), and comment whether the means of the original dataset Melbourne Housingfor BuildingArea and Landsize are included in these interval estimates. Justifyyour answer. [2 marks][8 marks][IV] Testing claims & Hypothesis Tests:Hint: Use the whole dataset to answer the question in this section.For full marks, define the parameters of interest appropriately, set-up of the nulland alternative hypotheses, clearly state the decision and the conclusion of thetest(r). The project management team of these housing projects is debating that there isno difference between the variables BuildingArea and Landsize. Use R to statis-tically test at a 5% level of significance if there is a difference in the average ofBuildingArea and Landsize. Give a verdict and conclusion to your analysis. [5marks](s). They further claim that there is a difference between the variables BuildingAreaand Landsize. Use R to statistically test at a 1% level of significance if there is adifference in the average of BuildingArea and Landsize. [5 marks]7(t). Another claim the project management team is making is that ideally the av-erage house (unit) Price should be greater than $1,000,000 ($600,000) using R.Statistically test at a 10% level of significance whether the average house (unit)price is greater than $1,000,000 ($600,000). Include a diagram for the hypothesistest. [5 marks](u). What does it mean by

Mohd · Accepted Answer

-
-
-
2023-03-20
library(readr)
library(magrittr)
library(dplyr)
finaldata %
  filter(Type=="h")
head(finaldata1,30)
## # A tibble: 30 × 21
##    Suburb   Address Rooms Type   Price Method SellerG Date       Dista…¹ Postc…²
##                              
##  1 Abbotsf… 25 Blo…     2 h     1.03e6 S      Biggin  2016-02-04     2.5    3067
##  2 Abbotsf… 5 Char…     3 h     1.46e6 SP     Biggin  2017-03-04     2.5    3067
##  3 Abbotsf… 55a Pa…     4 h     1.6 e6 VB     Nelson  2016-06-04     2.5    3067
##  4 Abbotsf… 124 Ya…     3 h     1.88e6 S      Nelson  2016-05-07     2.5    3067
##  5 Abbotsf… 98 Cha…     2 h     1.64e6 S      Nelson  2016-10-08     2.5    3067
##  6 Abbotsf… 10 Val…     2 h     1.10e6 S      Biggin  2016-10-08     2.5    3067
##  7 Abbotsf… 40 Nic…     3 h     1.35e6 VB     Nelson  2016-11-12     2.5    3067
##  8 Abbotsf… 16 Wil…     2 h     1.31e6 S      Jellis  2016-10-15     2.5    3067
##  9 Abbotsf… 42 Hen…     3 h     1.2 e6 S      Jellis  2016-07-16     2.5    3067
## 10 Abbotsf… 78 Yar…     3 h     1.18e6 S      LITTLE  2016-07-16     2.5    3067
## # … with 20 more rows, 11 more variables: Bedroom2 , Bathroom ,
## #   Car , Landsize , BuildingArea , YearBuilt ,
## #   CouncilArea , Lattitude , Longtitude , Regionname ,
## #   Propertycount , and abbreviated variable names ¹​Distance, ²​Postcode
(b). Use R to produce histograms of all the possible continuous variables. [4 marks]
par("mfrow"=c(3, 4))
hist(finaldata1$Rooms, col="blue",main = "Rooms")
hist(finaldata1$Price, col="blue",main = "Price")
hist(finaldata1$Distance, col="blue",main = "Distance")
hist(finaldata1$Postcode, col="blue",main = "Postcode")
hist(finaldata1$Bedroom2, col="green",main = "Bedroom_2")
hist(finaldata1$Bathroom, col="green",main = "Bathroom")
hist(finaldata1$Car, col="green",main = "Car")
hist(finaldata1$Landsize, col="green",main = "Landsize")
hist(finaldata1$BuildingArea, col="red",main = "Building Area") 
hist(finaldata1$YearBuilt, col="red",main = "Yearbuilt")
hist(finaldata1$Propertycount, col="red",main = "Proprtycount")
(c). Use R to produce descriptive statistics for all the variables in part (a). [4 marks]
skimr::skim(finaldata1)
Data summary
	Name
	finaldata1
	Number of rows
	4088
	Number of columns
	21
	_______________________
	
	Column type frequency:
	
	character
	7
	Date
	1
	numeric
	13
	________________________
	
	Group variables
	None
Variable type: character
	skim_variable
	n_missing
	complete_rate
	min
	max
	empty
	n_unique
	whitespace
	Suburb
	0
	1
	3
	18
	0
	280
	0
	Address
	0
	1
	8
	22
	0
	4037
	0
	Type
	0
	1
	1
	1
	0
	1
	0
	Method
	0
	1
	1
	2
	0
	5
	0
	SellerG
	0
	1
	1
	17
	0
	172
	0
	CouncilArea
	0
	1
	4
	17
	0
	31
	0
	Regionname
	0
	1
	16
	26
	0
	8
	0
Variable type: Date
	skim_variable
	n_missing
	complete_rate
	min
	max
	median
	n_unique
	Date
	0
	1
	2016-02-04
	2017-08-12
	2016-11-27
	51
Variable type: numeric
	skim_variable
	n_missing
	complete_rate
	mean
	sd
	p0
	p25
	p50
	p75
	p100
	hist
	Rooms
	0
	1
	3.31
	0.85
	1.00
	3.00
	3.00
	4.

[Title of your report] Introduction Provides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient....

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment