Biostatistics
Answer the following questions.
Copy and paste any required data charts or summaries into this Word document.
Include the file naming
convention.
I.
Descriptive
Statistics:
Download the data set Final_1.sav.
Complete the following:
1)
List the level of measurement for the variables,
AGE, SEX, AGEGRP, SBP1 in the data set and describe the appropriate numerical
and descriptive statistics based on these.
Record Number
|
AGE
|
1
|
3
|
2
|
11
|
3
|
15
|
4
|
46
|
5
|
14
|
6
|
35
|
7
|
46
|
8
|
35
|
9
|
40
|
10
|
29
|
11
|
22
|
12
|
16
|
2)
Calculate (by hand) the mean and standard
deviation for the first 12 records for age in the data set.
3)
Generate numerical and graphical descriptive
statistics for each of the variables, namely, AGE, SEX, AGEGRP and SBP1.
4)
Interpret the output you generated in part 2 for
each of the variables in the data set.
I.
Paired
and Independent ttests:
Download the data set Final_2.sav
and use SPSS to complete the following calculations:
1) Use the 5-step approach to hypothesis testing and the
calculation of the 95% confidence intervals to answer the following research
question: Was a significant difference
in Systolic Blood Pressure (SBP) observed over the course of the study?
2) Use the 5-step approach to hypothesis testing and the
calculation of the 95% confidence intervals to answer the following research
question: Is there a difference in SBP1
based on HIV status? (Hint: Assign Y as
group 1 and N as group 2)
II.
Cross-Tabulation:
III.
Download the data set Final_3.savand use SPSS
to complete the following calculations.
1)
Use the 5-step approach to hypothesis testing to
answer the following research question:
2)
In the sample provided in Final_3.sav, are the
variables income and Bladder Cancer independent of each other? (Note: The question could also be asked: Is there an
association between the variables because the lack of independence implies an
association)?
2) Answer the following based on the cross-tabulation of
alcohol consumption and Bladder Cancer:
Alcohol consumption * Bladder Cancer
Crosstabulation
|
Count
|
|
Bladder Cancer
|
Total
|
No
|
Yes
|
Alcohol consumption
|
"Less than 1 drink per
week"
|
30
|
54
|
84
|
4 or more drinks per month
|
22
|
115
|
137
|
Total
|
52
|
169
|
221
|
- Calculate the odds ratio.
- Describe how the odds
ratio differs from the relative risk or risk ratio and why you would chose
it here.
- Interpret the odds ratio
and how it might impact the practice of public health practitioners.
- If you wanted to know
whether this relationship was statistically significant what test(s) could
you use?
IV.
ANOVA:
Download the data set Final_4.sav
and use SPSS to complete the following calculations.
1) Produce box plots of income for each region of the US in
the data set and interpret them. Based
on the box plots do you expect to find a difference between any of the groups?
2) Create descriptive statistics for each region, using the
variable income.
Include skewness and kurtosis in the output.
Create a histogram for each group.
3) Run the ANOVA for income based on region. Include the ANOVA table and the test for
Homogeneity of Variance. Interpret the results.
5)
Conduct post hoc analysis using Bonferroni and
LSD methods to control for multiple testing.
- Provide the output.
- Interpret your results.
- Why do you need to use
methods like Bonferroni and LSD with the ANOVA?
V.
Regression:
VI.
Download the data set Final_5.sav and use SPSS to complete the following calculations.
1) Use
an independent t test and simple linear
regression to identify whether a relationship exists between gender and BMI.
- Run the appropriate t test in SPSS, report the
significance of the difference in means and the confidence interval, and
interpret the results.
- Run the simple linear
regression in SPSS, report the significance of the variable gender and the
overall fit of the model (using r2). Interpret the results.
- How are these two
approaches different?
- Are your conclusions the
same using both tests?
2) Answer
the questions using the provided output:
Multiple Linear Regression
Researchers
looked at the Emergency Department Records of 60 adults ages 22 to 46 years who
arrived in the ED complaining of chest pain during a 6 month period of time. They did not use a random sample as they
wanted 30 males and 30 females in the study.
They collected information on BMI (a measure of overweight/obesity),
Age, SBP (Systolic Blood Pressure) and the diagnosis of Diabetes. Their first hypothesis (alternative) was that
the dependent variable SBP is associated with BMI, Age, Diabetes, and Gender. They conducted a multiple linear regression
to test their hypothesis. Here are the
results (note that they had two models and chose to use the second one):
Model Summaryc
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
|
|
|
|
|
1
|
.796a
|
.634
|
.608
|
5.443
|
2
|
.792b
|
.627
|
.607
|
5.445
|
a. Predictors: (Constant), Diabetes,
Age, Gender, BMI
b. Predictors: (Constant), Age,
Gender, BMI
c. Dependent Variable: SBP
|
ANOVAc
|
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
|
1
|
Regression
|
XXXXXXXXXX
|
4
|
706.242
|
23.839
|
.000a
|
|
Residual
|
XXXXXXXXXX
|
55
|
29.626
|
|
|
|
Total
|
XXXXXXXXXX
|
59
|
|
|
|
|
2
|
Regression
|
XXXXXXXXXX
|
3
|
931.407
|
31.418
|
.000b
|
|
Residual
|
XXXXXXXXXX
|
56
|
29.646
|
|
|
|
Total
|
XXXXXXXXXX
|
59
|
|
|
|
|
a. Predictors: (Constant), Diabetes,
Age, Gender, BMI
b. Predictors: (Constant), Age,
Gender, BMI
c. Dependent Variable: SBP
|
|
Coefficientsa
|
Model
|
Standardized Coefficients
|
t
|
Sig.
|
95.0% Confidence Interval for B
|
Beta
|
Lower Bound
|
Upper Bound
|
1
|
(Constant)
|
|
8.092
|
.000
|
57.471
|
95.309
|
Gender
|
-.189
|
-2.100
|
.040
|
-6.381
|
-.149
|
BMI
|
.557
|
6.130
|
.000
|
1.213
|
2.392
|
Age
|
.507
|
6.067
|
.000
|
.426
|
.847
|
Diabetes
|
-.089
|
-1.019
|
.313
|
-4.752
|
1.549
|
2
|
(Constant)
|
|
8.885
|
.000
|
55.243
|
87.407
|
Gender
|
-.173
|
-1.950
|
.056
|
-6.054
|
.081
|
BMI
|
.574
|
6.413
|
.000
|
1.276
|
2.436
|
Age
|
.517
|
6.243
|
.000
|
.441
|
.859
|
|
a. Dependent Variable: SBP
|
1)
Which variables in model 1 are significant?
2)
Which variables in model 2 are significant?
3)
Why did they choose model 2?
4)
What is the “fit” of model 2 (the one they chose
to use)?
5)
Is this a good model, why or why not?
- Multiple Logistic
Regression
The Emergency Department
Researchers selected another 60 adults and again looked at Age, SBP, BMI,
Gender, and Diabetes. This time however,
they also collected information on whether the chest pain was diagnosed as an
MI (aka Heart Attack) or something else.
Now their alternative hypothesis was that gender was related to the
diagnosis of an MI, after controlling for Age, SBP, BMI, and Diabetes. They used multiple logistic regression to
test their hypothesis and these are their results (note that there are multiple
models and they chose to use the final one):
Model Fitting Information
|
Model
|
Model Fitting Criteria
|
Likelihood Ratio Tests
|
-2 Log Likelihood
|
Chi-Square
|
df
|
Sig.
|
Intercept Only
|
74.995
|
|
|
|
Final
|
16.398
|
58.598
|
5
|
.000
|
Pseudo R-Square
|
|
Cox and Snell
|
.623
|
|
Nagelkerke
|
.866
|
|
|
|
|
|
|
|
McFadden
|
.767
|
|
Parameter Estimates
|
Heart Attacka
|
B
|
Std. Error
|
Wald
|
df
|
Sig.
|
Exp(B)
|
No
|
Intercept
|
115.037
|
43.679
|
6.936
|
1
|
.008
|
|
BMI
|
-1.400
|
.572
|
5.995
|
1
|
.014
|
.247
|
Age
|
.037
|
.116
|
.099
|
1
|
.753
|
1.037
|
Diabetes
|
.811
|
1.471
|
.304
|
1
|
.581
|
2.251
|
SBP
|
-.469
|
.213
|
4.849
|
1
|
.028
|
.626
|
[Gender=1]
|
-11.866
|
4.695
|
6.389
|
1
|
.011
|
7.025E-6
|
[Gender=2]
|
0b
|
.
|
.
|
0
|
.
|
.
|
Parameter Estimates
|
Heart Attacka
|
95% Confidence Interval for Exp(B)
|
Lower Bound
|
Upper Bound
|
No
|
Intercept
|
|
|
BMI
|
.080
|
.756
|
Age
|
.826
|
1.303
|
Diabetes
|
.126
|
40.193
|
SBP
|
.412
|
.950
|
[Gender=1]
|
7.088E-10
|
.070
|
[Gender=2]
|
.
|
.
|
1) Is the final
model significant?
2) What are the
odds ratios for each of the significant variables, and what do they mean? 3) Will
this model help the researchers, why or why not?