Introduction to Biostatistics 2020
Assignment 4
[5 questions in total]
This assignment is due to be submitted by 8pm Saturday 27th June XXXXXXXXXXNo late submissions will
e accepted. The assignment must be submitted via MyUni in one single pdf or Microsoft Word
document. Legible hand-writing is acceptable as part of your submission.
Please include your student ID in the Header or Footer on each page and number each page in
your assignment.
This assignment is worth 40% of the total credit for this course. [There is a total of 55 marks in this
assignment, which will be rescaled in the final overall course mark calculations.]
Answer all questions. You may use a computer or calculator to assist with summarising data and
doing intermediate calculations, but you may lose points if your answer is inco
ect and you have not
provided evidence of your working.
Page 2 of 15
Question 1 [5 marks]
A study on the effectiveness of an internet-based walking program for adults was conducted on
two groups of participants (the intervention and control group ) randomly selected from the
Australian Electoral Roll. The two groups had similar activity levels at the start of the study. All
participants received a pedometer (i.e. a step counter) and the intervention group also received
access to an internet-based walking program.
Results summarising the daily number of steps observed for both the control and intervention
group after the six week study period can be seen below.
Sample Size Sample Mean Sample Std Deviation
Control
Intervention (walking program)
XXXXXXXXXX
XXXXXXXXXX
(a) Investigators are keen to determine whether there is a difference between the daily number of steps
(on average) in the population of eligible adults receiving a pedometer only (control) compared with
the population of eligible adults receiving a pedometer and the walking program (intervention) after six
weeks. State which one of the following test statistics is appropriate here.
(i)
n
x
zstat
0
−
=
(ii)
−
=
x
zstat
(iii) ,
11
0
21
2
21
+
−−
=
nn
s
xx
t stat where
2
)1()1(
21
2
22
2
112
−+
−+−
=
nn
snsn
s
(iv)
2
2
2
1
2
1
21 0
n
s
n
s
xx
tstat
+
−−
=
(v)
n
s
x
t
D
D
stat
2
0−
=
[1 mark]
(b) A specialist exercise scientist says that the observed difference between the two sample means is not
meaningful in this context as she believes it is unlikely to coincide with any obvious health benefits.
A 95% confidence interval for the difference in mean number of steps in the population (that is,
Control – Intervention) is (-1212, XXXXXXXXXXWithout calculation, use this information to explain, from a
statistical perspective, whether you agree with the specialist.
[4 marks]
Page 3 of 15
Question 2 [14 marks]
In a small clinical trial to assess the effectiveness of a new tranquilizer drug for psychoneurotic
patients (relative to placebo), each of 10 adult patients were given one week of treatment with
the drug, and one week of treatment with the placebo. The order of drug and placebo was
determined randomly for each patient. At the end of each week (of treatment or placebo
medication), each patient had to complete a questionnaire, on the basis of which he or she was
given an ‘anxiety score’ (with possible values ranging from 0 to 30 and higher scores
suggesting higher states of anxiety). Relevant summary statistics are shown in the table
elow.
Sample Size XXXXXXXXXXSample Mean XXXXXXXXXXSample Std Deviation
With drug (D)
With placebo (P)
Difference (D-P)
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
(a) State the appropriate Null and Alternative hypotheses to be tested. Define any parameters that
you mention.
[2 marks]
(b) Which one of the test statistics (listed in Question 1(a)) should be used to test the Null
hypothesis stated in (a) above?
[1 mark]
(c) What is the distribution of the test statistic selected in (b) above if the Null hypothesis is true?
[1 mark]
(d) Calculate the value of the test statistic for these data and report the approximate P-value.
[3 marks]
(e) With the aid of a diagram, explain what this P-value means and explain your conclusion within
the context of this study.
[3 marks]
(f) After making your conclusion:
(i) Is it possible that a Type 1 e
or has occu
ed here? Explain your answer.
(ii) Is it possible that a Type 2 e
or has occu
ed here? Explain your answer.
[4 marks]
Page 4 of 15
Question 3 [12 marks]
People with O-negative blood are called “universal donors” because O-negative blood can be
given to anyone else, regardless of the recipient’s blood type. Around 9% of the Australian
population have O-negative blood. If 10 individuals were to be randomly selected from the
population and X is considered to be the number of individuals who have O-negative blood,
answer the following questions.
(a) List the assumptions that must be satisfied for X to follow a Binomial distribution and
iefly
assess the validity of each assumption in the context of this example.
[3 marks]
(b) Assume that X can be modelled as a binomial variable, i.e. the probability of
x successes in
n
trials can be calculated by
xnx
xnx
n
xXP −−
−
== )1()(
)!(!
!
)(
where = probability of success (i.e. of having O-negative blood).
Showing all working, calculate the probability that:
(i) No individuals (in the sample of 10) will be O-negative.
(ii) At least one individual (in the sample of 10) will be O-negative.
[4 marks]
(c) Blood donors with low iron levels (indicated by a serum fe
itin level < 20 g/L) are typically
stopped from donating blood for at least 6 months.
(i) The serum fe
itin levels of blood donors in the Australian adult population are thought to
e Normally distributed with mean = and variance = 2. Explain
iefly how this
assumption (of Normality) could be checked if researchers had access to a random
sample of serum fe
itin levels from 100 donors.
[2 marks]
(ii) If you are told that = 60g/L and that 2 = 400µg2/L2, determine the probability of
andomly selecting a donor with a low iron level (i.e. a serum fe
itin level < 20 g/L) from
this population.
[Hint: a diagram might help]
[3 marks]
Page 5 of 15
Question 4 [9 marks]
In an influenza vaccine trial ca
ied out during an epidemic of influenza, 240 adults received a
newly developed influenza vaccination and 220 adults received a placebo vaccination. Overall,
100 people subsequently contracted influenza, of whom 20 were in the Influenza vaccine group
and 80 were in the Placebo vaccine group. The observed data are summarized in the table below.
Contracted Influenza
Type of vaccine Yes No
Influenza vaccine 20 220
Placebo vaccine 80 140
(a) When assessing the relationship between contraction of influenza and type of vaccine,
iefly
comment on what would be wrong with looking only at adults who received the influenza vaccine and
comparing the number who contracted influenza to the number who did not contract influenza.
[1 mark]
(b) State an appropriate Null and Alternative hypotheses for this trial, and the test statistic which will allow
you to test the Null hypothesis. Also report the distribution of this test statistic if the Null hypothesis is
true.
[2 marks]
(c) Calculate the expected value for each cell of the 2x2 table.
[2 marks]
(d) Calculate the value of the appropriate test-statistic using these data and report the co
esponding P-
value.
[2 marks]
(e) What do you conclude about the relationship between type of vaccination in adults and contraction of
influenza in the population? Please assume = 0.05 for this question.
[2 marks]
Page 6 of 15
Question 5 [15 marks]
This question concerns a subset of data from a study concerned with patients with chronic
obstructive airways disease (COPD). Participants in the study were recruited from an outpatient
clinic held at the Taipei Tzu Chi Hospital. In part of this study, the relationship between peak
oxygen pulse and anaerobic threshold was investigated. Results pertaining to 68 patients are
summarised in the output presented in this question, starting with Figure 1.
Figure 1: Scatterplot of peak oxygen pulse (PO2) versus anaerobic threshold (AT).
(a) Look at Figure 1 and decide which of the following statements would best describe this relationship.
Justify your answer.
(i) weak negative linear relationship
(ii) weak positive linear relationship
(iii) strong negative linear relationship
(iv) strong positive linear relationship
[1 mark]
Page 7 of 15
A simple linear regression model was fitted to these data. Output from Microsoft Excel and plots from
Stata are shown below:
Page 8 of 15
(b) For the simple linear regression model of outcome variable Y on predictor variable X