STA2300 Data Analysis S1, 18
1
Assignment 3
Due Date: 29 May, 2018
Weighting: 25%
Full Marks: 100
Answering the questions in this assignment should not be your first attempt at these types of
questions. It is essential that you work through practice exercises from the tutorial sheets and
Text Book first.
This assignment is important in checking your knowledge, providing feedback and helping to
establish competency in essential skills.
Answer all the questions. The questions are not of equal weight; some questions are worth much
more than others.
The questions relate to material up to and including Module 10.
Before starting this assignment read Notes Concerning Assignments under the Introductory
Material link on the StudyDesk.
When you are asked to comment on a finding, usually a short paragraph is required.
Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many
cases the SPSS output contains much more information than is required for a co
ect and
complete answer. In those cases just reproducing the output may not attract any marks. Make
sure you report only the information from the SPSS output relevant to your answer.
Unless instructed otherwise, show all working and formulae used in calculating confidence
intervals and performing hypothesis tests. (Answers may of course be checked where possible
using SPSS).
In order to obtain full marks for any question you must show all working.
Submission is via the link on the StudyDesk.
This assessment item consists of 5 questions.
STA2300 Data Analysis S1, 18
2
Requirements for a passing grade:
As you may have seen in the Course Specification, to receive a passing grade you must achieve at least 40% (i.e., 20/50) of the
marks available in the final examination, and at least 50% of the total weighted marks available for the course.
If you get over 50% weighted marks in all assessments, but did not get at least 40% marks in the final exam, you will not pass
the course.
Also, if you get at least 40% marks in the final exam but do not get at least 50% of weighted marks in the course, you will not
pass the course.
Note on Assignment 3 Solutions, Marks and Late Submissions:
Because of the timing of Assignment 3, marks for this assignment will not be available until after the exam. However, feedback
for this assignment will be available in the form of comprehensive worked solutions before the exam via the StudyDesk. As a
esult, any Assignment 3 submitted after 5pm AEST on the Friday before the exam period will not receive any marks.
Question 1 (25 marks)
Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS
to calculate the sample statistics you will need to do this question, but for parts (a) and (d) you are
equired to do the rest of the calculations by hand, using a calculator.
(a) (7marks) Estimate the population mean weight of women with no education in 2011, using a
99% confidence interval (show all working). Make sure you ONLY select women who have no
education.
(b) (6marks) Check the appropriate conditions and assumptions needed for the validity of the
confidence interval or hypothesis test for the population mean weight of women with no
education (include an appropriate graph to support your answer).
(c) (3 marks) From historical data, a researcher knows that the average weight of women in
developing countries who have no education is 52.5 kg. State appropriate hypotheses (define
any symbols used) to perform a hypothesis test to see if there is evidence to support her
suspicion, based on the data in this study, that the average weight of women in developing
countries in 2011 who have no education is greater than the historical value (regardless of
whether the conditions in part (b) are satisfied).
(d) (2 marks) Calculate the value of a suitable test statistic for the test in part (c)
(e) (4 marks) Find the P-value of the test, based on the test statistic calculated in part (d), and write
a meaningful conclusion at the 1% level of significance.
(f) (3 marks) Now, check your answers for parts (d) and (e) by finding the value of the test statistic
and the P-value using SPSS. Include SPSS output in your answer and comment on the comparison
with the hand calculated values. Explain any differences.
STA2300 Data Analysis S1, 18
3
Question 2 (27 marks)
Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS
to calculate any sample statistics you will need to do this question, but for parts (d)-(g) you are required
to do the rest of the calculations by hand, using a calculator and statistical tables.
According to the Bureau of Statistics in the developing country being surveyed, 5% of women were
‘higher educated’ before 2011. The researcher believes that the proportion of all women in the
developing country with such qualifications was no longer 5% in 2011.
(a) (1 mark) What is the variable of interest to the researcher?
(b) (3 marks) State the appropriate hypotheses (define any symbols used) to test the researcher’s
claim that the proportion of women who are ‘higher educated’ in 2011 was no longer 5%.
(c) (4 marks) Check the conditions and assumptions for the test in part (b).
(d) (4 marks) Calculate the test statistic for the test in part (b)
(e) (8 marks) Find the P-value for the test in part (d) and write a meaningful conclusion in the context
of this situation.
(f) (4 marks) If the researcher wants to be 99% confident that the margin of e
or of the estimate
of the true proportion of women who are ‘higher educated’ is within 0.06, what minimum
sample size is required? Use a conservative method in determining the sample size.
(g) (3 marks) The researcher decides that instead of using a conservative method (as required in
part (f)), she will use information obtained from the DHS18.sav data to decide how many women
she would need to survey (keeping the same level of confidence and margin of e
or). What is
the impact of this decision? (Include evidence to support your answer).
Question 3 (16 marks)
Use the information in the dataset DHS18.sav again. The systolic blood pressure (BP) of the women was
measured in 2011 and in a follow-up in 2014. The researcher wants to know, if, on average, the systolic
BP of the poorest women in 2014 is significantly greater than the systolic BP of the same cohort in 2011.
Make sure you select ONLY poorest women.
(a) (3 marks) State appropriate hypotheses (define any symbols used).
(b) (2 marks) State (but do not check) the assumptions for ca
ying out this test. Describe the
assumptions in the context of this question.
(c) (2 marks) Without using SPSS, calculate the value of a suitable test statistic for this test. You can
use SPSS for calculating appropriate sample statistics.
STA2300 Data Analysis S1, 18
4
(d) (3 marks) Without using SPSS, calculate the P-value of this test.
(e) (3 marks) Interpret the P-value and describe the outcome of the test in the context of this
question.
(f) (3 marks) Now use SPSS to ca
y out the analysis. Copy and paste the relevant SPSS output to
your assignment solution. Do these results agree with those found in part (e)? (Hint: comment
on the P-value).
Question 4 (20 marks)
Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS to
calculate any sample statistics you will need to do this question, but for part (e) you are required to do
the rest of the calculations by hand, using a calculator.
The researcher is concerned that the weight of women in 2014 depends on their wealth. She believes
that the average weight of ‘poorer’ women is greater than that of the ‘richer’ women in this developing
country.
(a) (4 marks) Use an appropriate graph to compare the distribution of weight of ‘poorer’ women
with that of ‘richer’ women. Label the axes co
ectly, include a unit of measure and provide an
appropriate title. Make sure you select ONLY ‘richer’ and ‘poorer’ women.
(b) (2 marks) Using the graph produced in part (a),
iefly describe the distribution of weight for the
two groups of women (poorer and richer).
(c) (3 marks) State appropriate hypotheses (defining all symbols) to answer the question: ‘Is the
average weight of women greater for all ‘poorer’ women compared to all ‘richer’ women in this
developing country in 2011?’
(d) (2 marks) Check the assumptions for ca
ying out the test in part (c).
(e) (2 marks) Without using SPSS, calculate a suitable test statistic for the test in part (c).
(f) (4 marks) Without using SPSS, find the P-value of the test. Interpret the P-value and describe
the outcome of the original question.
(g) (1 mark) Now use SPSS to check your results for this hypothesis test. Copy and paste the relevant
output from SPSS for this test into your assignment.
(h) (2 marks) Briefly comment on how the test statistic and P-value from SPSS output are similar to
or differ from your hand calculations.
STA2300 Data Analysis S1, 18
5
Question 5 (12 marks)
Give a
ief answer to each of the following six (6) questions:
(a) (2 marks) State the differences between convenience sampling and cluster sampling.
(b) (2 marks) Explain the difference between a Type 1 and