Answered Same Day Jul 21, 2021

## Solution

Atreye answered on Jul 24 2021
Data Analysis Project
Part D: Project summary and findings
(a) The data consists of the annual temperatures of US for over 118 years starting from 1895 to 2013. The temperatures are listed in a tabular form. The data is collected to analyse how the temperature of US changes over various years. In this assignment the data has been analysed. Exploratory data analysis has been performed to see the shape of the distribution and also to obtain the summary statistics. Testing of hypothesis has been performed to test the population mean. Apart from that, regression analysis has been done to fit a trend line to the dataset and also to predict the temperature for the year 2016 and 2500.
(b) The shape of the distribution can be detrmined in various ways which are listed below:
Â· From stem-and-leaf display, it can be observed that the graph is vertically streched out downwards which implies that the data is postively skewed. After trimming the outlier, the data also remains same that is right skewed.
Â· From histogram it can be seen that the curve has a long tail towards the right side of the garph which means that data is postively skewed. After trimming the outlier, the data also remains same that is right skewed.
Â· From the box and whiskers plot, it can be seen that the distance between third quartile and median is larger than the distance between median and first quartile which imlies that the data is right-skewed or positively skewed. After trimming the outlier, the data also remains same that is right skewed.
As a conclusion, the data is positively skewed.
(c) It can be seen from the box and whiskers plot that there is one outlier present at the upper hand of the in the dataset which is 118th observation of the dataset that is 55.33 degrees Fahrenheit.
(d) The mean of the dataset is 52.18605 degrees Fahrenheit which implies that the average temperature of US is almost 52.18605 degrees Fahrenheit.
The mode of the dataset is 51.99 degrees Fahrenheit which implies half of the dataset is greater than or equal to 51.99 degrees Fahrenheit and half of the dataset is less than or equal to 51.99 degrees Fahrenheit.
The dataset has more than 1 mode.
The mean is best measure of the location since it involves all the data points where median does not take into account all the observations and also median is least affected by the presence of outliers. Apart from that there are no specific mode in the dataset. Hence, mean is more accurate over median and mode.
(e) The range of the dataset is 5.17 and standard deviation is almost 0.916.
(f) 90% confidence interval...
