Assignment 3

Due Date: 09 October, 2017 Weighting: 25% Full Marks: 100

Answering the questions in this assignment should not be your first attempt at these types of questions. It is essential that you work through practice exercises from the tutorial sheets and text book first. This assignment is important in assessing your understanding of the materials and receiving feedback that should be helping you to establish competency in essential skills. Answer all the questions. The questions are not of equal weight; some questions are worth much more than others. The questions relate to material up to and including Module 10. Before starting this assignment read Notes Concerning Assignments under the Introductory Material link on the StudyDesk. When you are asked to comment on a finding, usually a short paragraph is required. Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many cases the SPSS output contains much more information than is required for a correct and complete answer. In those cases just reproducing the output may not attract any marks. You must write relevant information in the SPSS output within the text of your answer. Make sure you report only the information from the SPSS output relevant to your answer. Unless instructed otherwise, show all working and formulae used in calculating confidence intervals and performing hypothesis tests. (Answers may of course be checked where possible using SPSS). In order to obtain full marks for any question you must show all working. It is recommended that you show your working, rather than writing the final answer. This assessment item consists of 6 questions.

Requirements for a passing grade: As you may have seen in the Course Specifications, to receive a passing grade you must achieve at least 40% of the (ie, 20/50) marks available in the final examination, and at least 50% of the total weighted marks available for the course. If you get over 50% weighted marks in all assessments but did not get at least 40% marks in the final exam, you will not pass the course. Also, if you get at least 40% marks in the final exam but did not get at least 50% of weighted marks in the course, you will not pass the course.

Note on Assignment 3 Solutions, Marks and Late Submissions: Because of the timing of Assignment 3, marks for this assignment will not be available until after the exam. However, feedback for this assignment will be available in the form of comprehensive worked solutions before the exam via the StudyDesk. As a result, any Assignment 3 submitted after 5pm AEST on the Friday before the exam period will not receive any marks.

STA2300 Data Analysis S2, 2017

2

Question 1 (26 marks)

The data on the revenue of a reputed company are recorded in the file sales.sav. We are interested in the revenue ( y ) of the company.

a) [4 Marks] Using SPSS, compute an estimate of the mean (

) and standard deviation (

) of

revenue of the company. b) [6 Marks] Find a 90% confidence interval for the true mean revenue of the company. (You may use the relevant summary statistics computed in the previous part of this question by SPSS.) c) [4 Marks] Justify the use of the confidence interval formula in part (b) by checking the appropriate conditions and assumptions (include an appropriate graph to support your answer). d) [2 Marks] Give an appropriate interpretation of the 90% confidence interval. e) [6 Marks] Setting up appropriate hypotheses, calculate the value of a suitable test statistic to check if the true mean revenue is greater than $2450. f) [4 Marks] Find the P-value of the test in part (e), and comment on whether you would reject the null hypothesis at the 5% level of significance.

Question 2 (16 marks)

Researchers investigating crime believe that at least 40% of all arsonists are under 21 years of age. Checking local crime statistics, they found that 44 out of 80 arson suspects were under 21.

a) [2 Marks] What is the variable of interest of the researchers? b) [8 Marks] Does the sample data support their belief? Perform an appropriate hypothesis test. c) [6 Marks] If the researchers want to be 95% confident that the error of estimate to be within 0.05, what sample size is required for the study?

Question 3 (22 marks)

Use the information in the employee.sav data to answer the following questions. The variable of interest in this question is the beginning salary (salbegin).

a) [4 Marks] Do you suspect any significant difference between the beginning salary of male and female employees? Using an appropriate graph produced by SPSS, check the suspicion of difference in the beginning salary for male and female employees. b) [8 Marks] Stating appropriate hypotheses, find the value of a suitable test statistic to check if mean beginning salary for male employees is significantly greater than that of the female employees. (Be sure to define any notation used.) c) [4 Marks] At the 1% level of significance describe the outcome of the test. d) [6 Marks] Without using SPSS, compute a 99% confidence interval for the difference in the mean beginning salary for male and female employees. (You may use the relevant summary statistics computed in the previous parts of this question by SPSS, but you must do the calculations using appropriate formula without using SPSS.)

STA2300 Data Analysis S2, 2017

3

Question 4 (14 marks)

The average cholesterol content of a certain brand of eggs is 215 milligrams and the standard deviation is 15 milligrams. Assume that the cholesterol content in eggs is normally distributed.

a) [5 Marks] If a single egg is selected, find the probability that the cholesterol content of this egg will be more than 220 milligrams. b) [4 Marks] For a random sample of size 25 eggs, state the sampling distribution of the sample mean. (Specify the name of the distribution and the parameters of the distribution.) c) [5 Marks] If a random sample of 25 eggs is selected, find the probability that the mean cholesterol content will be more than 220 milligrams.

Question 5 (10 marks)

Answer the following questions: a) [2 Marks] State the difference between a parameter and a statistic. b) [2 Marks] State the law of large numbers. c) [2 Marks] Why is the Central Limit Theorem important in statistics? d) [2 Marks] What is meant by the sampling distribution of sample mean? e) [2 Marks] Telephone surveys can record very high non-response rates through people refusing to participate and people not being home. In about 30 words, identify the major problem associated with the validity of the results of a survey of this type.

Question 6 (12 marks)

In order to test a theory that alcohol consumption can have an effect on test scores, a researcher conducted a study on 10 randomly selected adults. Each subject was given a test. Then for one week, each subject is required to consume a specific amount of alcohol, then they are tested again. Data on the test scores are given below.

Subject Before After Consumption Consumption XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX

Is there evidence that alcohol consumption has reduced the test score?

a) [7 Marks] Stating appropriate hypotheses, perform a parametric test and make conclusions in the context of the study. b) [5 Marks] Stating appropriate hypotheses, perform a non-parametric test and make conclusions in the context of the study.

