Biostatistics

Answer the following questions.
Copy and paste any required data charts or summaries...

Question

Biostatistics

Answer the following questions.
Copy and paste any required data charts or summaries into this Word document.

Include the file naming
convention.

I.
Descriptive
Statistics:

Download the data set Final_1.sav.
Complete the following:

1)
List the level of measurement for the variables,
AGE, SEX, AGEGRP, SBP1 in the data set and describe the appropriate numerical
and descriptive statistics based on these.

Record Number

AGE

1

3

2

11

3

15

4

46

5

14

6

35

7

46

8

35

9

40

10

29

11

22

12

16

2)
Calculate (by hand) the mean and standard
deviation for the first 12 records for age in the data set.

3)
Generate numerical and graphical descriptive
statistics for each of the variables, namely, AGE, SEX, AGEGRP and SBP1.

4)
Interpret the output you generated in part 2 for
each of the variables in the data set.

I.
Paired
and Independent ttests:

Download the data set Final_2.sav
and use SPSS to complete the following calculations:

1) Use the 5-step approach to hypothesis testing and the
calculation of the 95% confidence intervals to answer the following research
question: Was a significant difference
in Systolic Blood Pressure (SBP) observed over the course of the study?

2) Use the 5-step approach to hypothesis testing and the
calculation of the 95% confidence intervals to answer the following research
question: Is there a difference in SBP1
based on HIV status? (Hint: Assign Y as
group 1 and N as group 2)

II.
Cross-Tabulation:

III.
Download the data set Final_3.savand use SPSS
to complete the following calculations.

1)
Use the 5-step approach to hypothesis testing to
answer the following research question: 
2)
In the sample provided in Final_3.sav, are the
variables income and Bladder Cancer independent of each other? (Note: The question could also be asked: Is there an
association between the variables because the lack of independence implies an
association)?

2) Answer the following based on the cross-tabulation of
alcohol consumption and Bladder Cancer:

Alcohol consumption * Bladder Cancer
 Crosstabulation

Count

Bladder Cancer

Total

No

Yes

Alcohol consumption

"Less than 1 drink per
 week"

30

54

84

4 or more drinks per month

22

115

137

Total

52

169

221

Calculate the odds ratio.

Describe how the odds
 ratio differs from the relative risk or risk ratio and why you would chose
 it here.

Interpret the odds ratio
 and how it might impact the practice of public health practitioners.

If you wanted to know
 whether this relationship was statistically significant what test(s) could
 you use?

IV.
ANOVA:

Download the data set Final_4.sav
and use SPSS to complete the following calculations.

1) Produce box plots of income for each region of the US in
the data set and interpret them. Based
on the box plots do you expect to find a difference between any of the groups?

2) Create descriptive statistics for each region, using the
variable income. 
Include skewness and kurtosis in the output.

Create a histogram for each group.

3) Run the ANOVA for income based on region. Include the ANOVA table and the test for 
Homogeneity of Variance. Interpret the results.

5)
Conduct post hoc analysis using Bonferroni and
LSD methods to control for multiple testing.

Provide the output. 
 Interpret your results. 
 Why do you need to use
 methods like Bonferroni and LSD with the ANOVA?

V.
Regression:

VI.
Download the data set Final_5.sav and use SPSS to complete the following calculations.

1) Use
an independent t test and simple linear
regression to identify whether a relationship exists between gender and BMI.

Run the appropriate t test in SPSS, report the
 significance of the difference in means and the confidence interval, and
 interpret the results.

Run the simple linear
 regression in SPSS, report the significance of the variable gender and the
 overall fit of the model (using r2). Interpret the results.

How are these two
 approaches different?

Are your conclusions the
 same using both tests?

2) Answer
the questions using the provided output:

Multiple Linear Regression

Researchers
looked at the Emergency Department Records of 60 adults ages 22 to 46 years who
arrived in the ED complaining of chest pain during a 6 month period of time. They did not use a random sample as they
wanted 30 males and 30 females in the study.
They collected information on BMI (a measure of overweight/obesity),
Age, SBP (Systolic Blood Pressure) and the diagnosis of Diabetes. Their first hypothesis (alternative) was that
the dependent variable SBP is associated with BMI, Age, Diabetes, and Gender. They conducted a multiple linear regression
to test their hypothesis. Here are the
results (note that they had two models and chose to use the second one):

Model Summaryc

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.796a

.634

.608

5.443

2

.792b

.627

.607

5.445

a. Predictors: (Constant), Diabetes,
 Age, Gender, BMI
 b. Predictors: (Constant), Age,
 Gender, BMI
 c. Dependent Variable: SBP

ANOVAc

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

XXXXXXXXXX

4

706.242

23.839

.000a

Residual

XXXXXXXXXX

55

29.626

Total

XXXXXXXXXX

59

2

Regression

XXXXXXXXXX

3

931.407

31.418

.000b

Residual

XXXXXXXXXX

56

29.646

Total

XXXXXXXXXX

59

a. Predictors: (Constant), Diabetes,
 Age, Gender, BMI
 b. Predictors: (Constant), Age,
 Gender, BMI
 c. Dependent Variable: SBP

Coefficientsa

Model

Standardized Coefficients

t

Sig.

95.0% Confidence Interval for B

Beta

Lower Bound

Upper Bound

1

(Constant)

8.092

.000

57.471

95.309

Gender

-.189

-2.100

.040

-6.381

-.149

BMI

.557

6.130

.000

1.213

2.392

Age

.507

6.067

.000

.426

.847

Diabetes

-.089

-1.019

.313

-4.752

1.549

2

(Constant)

8.885

.000

55.243

87.407

Gender

-.173

-1.950

.056

-6.054

.081

BMI

.574

6.413

.000

1.276

2.436

Age

.517

6.243

.000

.441

.859

a. Dependent Variable: SBP

1)
Which variables in model 1 are significant? 
2)
Which variables in model 2 are significant? 
3)
Why did they choose model 2? 
4)
What is the “fit” of model 2 (the one they chose
to use)? 
5)
Is this a good model, why or why not?

Multiple Logistic
 Regression

The Emergency Department
Researchers selected another 60 adults and again looked at Age, SBP, BMI,
Gender, and Diabetes. This time however,
they also collected information on whether the chest pain was diagnosed as an
MI (aka Heart Attack) or something else.
Now their alternative hypothesis was that gender was related to the
diagnosis of an MI, after controlling for Age, SBP, BMI, and Diabetes. They used multiple logistic regression to
test their hypothesis and these are their results (note that there are multiple
models and they chose to use the final one):

Model Fitting Information

Model

Model Fitting Criteria

Likelihood Ratio Tests

-2 Log Likelihood

Chi-Square

df

Sig.

Intercept Only

74.995

Final

16.398

58.598

5

.000

Pseudo R-Square

Cox and Snell

.623

Nagelkerke

.866

McFadden

.767

Parameter Estimates

Heart Attacka

B

Std. Error

Wald

df

Sig.

Exp(B)

No

Intercept

115.037

43.679

6.936

1

.008

BMI

-1.400

.572

5.995

1

.014

.247

Age

.037

.116

.099

1

.753

1.037

Diabetes

.811

1.471

.304

1

.581

2.251

SBP

-.469

.213

4.849

1

.028

.626

[Gender=1]

-11.866

4.695

6.389

1

.011

7.025E-6

[Gender=2]

0b

.

0

.

Parameter Estimates

Heart Attacka

95% Confidence Interval for Exp(B)

Lower Bound

Upper Bound

No

Intercept

BMI

.080

.756

Age

.826

1.303

Diabetes

.126

40.193

SBP

.412

.950

[Gender=1]

7.088E-10

.070

[Gender=2]

.

1) Is the final
model significant? 
2) What are the
odds ratios for each of the significant variables, and what do they mean? 3) Will
this model help the researchers, why or why not?

Robert · Accepted Answer

Biostatistics 
Answer the following questions. Copy and paste any required data charts or summaries into this Word 
document. 
Include the file naming convention. 
I. Descriptive Statistics: 
Download the data set Final_1.sav. Complete the following: 
1) List the level of measurement for the variables, AGE, SEX, AGEGRP, SBP1 in the data set and describe 
the appropriate numerical and descriptive statistics based on these. 
Age: This variable is numerical in nature and its level of measurement is ratio. This is because the data 
classifications are ordered according to the amount of the characteristics they posses 
Sex: This variable is nominal in nature since Males (M) and females (F) are just names of categories. 
There is no intrinsic ordering between them 
AgeGrp:  This is an ordinal variable as it assign numbers to rank-ordered categories ranging from low to 
high. In this data set, age group is ranked as 1 to 5 
SBP1: This is also a numerical variable and is ordinal in nature. This is a continuous variable as it 
represents blood pressure in continuous series 
2) Calculate (by hand) the mean and standard deviation for the first 12 records for age in the data set. 
Record 
Number 
AGE AGE - MEAN (AGE - MEAN)^2 
1 3 -23 529 
2 11 -15 225 
3 15 -11 121 
4 46 20 400 
5 14 -12 144 
6 35 9 81 
7 46 20 400 
8 35 9 81 
9 40 14 196 
10 29 3 9 
11 22 -4 16 
12 16 -10 100 
  SUM 2302
 
Mean =
                   

3) Generate numerical and graphical descriptive statistics for each of the variables, namely, AGE, SEX, 
AGEGRP and SBP1. 
NUMERICAL AND GRAHICAL REPRESENTATION 
AGE
Valid 78
Missing 0
27.93
27.56
14.487
63
1
64
Range
Minimum
Maximum
age  
N
Mean
Median
Std. Deviation
Mean = 
                                  
  
  
Mean = 
   
  
   26 
Standard deviation = √
∑            
   
 
Standard deviation = √
    
     
  14.467 
SEX
AGEGRP
N Valid 78
Missing 0
Frequency Percent
Valid 
Percent
Cumulativ
e Percent
F 38 48.7 48.7 48.7
M 40 51.3 51.3 100
Total 78 100 100
Valid
Statistics
sex  
Valid 78
Missing 0
2.81
3
2
a
5
1
6
Range
Minimum
Maximum
Statistics
agegrp  
N
Mean
Median
Mode
SBP1
4) interpret the output you generated in part 2 for each of the variables in the data set. 
From the results generated in part 2, we concluded that the average age of the individuals is 26 years 
with standard deviation of about 14.47. From the value of standard deviation, we can say that there is 
lot of variation in the age of the individuals in the given dataset. 
I. Paired and Independent t tests: 
Download the data set Final_2.sav and use SPSS to complete the following calculations: 
1) Use the 5-step approach to hypothesis testing and the calculation of the 95% confidence intervals to 
answer the following research question: Was a significant difference in Systolic Blood Pressure (SBP) 
observed over the course of the study? 
One-Sample Statistics 
 N Mean Std. Deviation Std. Error Mean 
sbp1 72 110.50 16.014 1.887
One-Sample Test 
sbp1  
N Valid 78
Missing 0
Mean 108.94
Median 111.29
Std. 
Deviation
16.308
Range 82
61
143
Minimum
Maximum
 Test Value = 0 
t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the 
Difference 
Lower Upper 
sbp1 58.551 71 .000 110.499 106.74 114.26
2) Use the 5-step approach to hypothesis testing and the calculation of the 95% confidence intervals to 
answer the following research question: Is there a difference in SBP1 based on HIV status? (Hint: Assign 
Y as group 1 and N as group 2) 
II. Cross-Tabulation: 
III. Download the data set Final_3.sav and use SPSS to complete the following calculations. 
1) Use the 5-step approach to hypothesis testing to answer the following research question: 
2) In the sample provided in Final_3.sav, are the variables income and Bladder Cancer independent of 
each other? (Note: The question could also be asked: Is there an association between the variables 
because the lack of independence implies an association)? 
2) Answer the following based on the cross-tabulation of alcohol consumption and Bladder Cancer: 
Alcohol consumption * Bladder Cancer Crosstabulation 
Count
Bladder Cancer 
Total No Yes 
Alcohol consumption "Less than 1 drink per 
week" 
30 54 84 
4 or more drinks per 
month 
22 115 137 
Total 52 169 221 
Calculate the odds ratio. 
Odds Ratio (of having bladder cancer by taking less than1 per drink per week or vs. 4 or more per 
month) = (54/30)/ (115/22) = 0.34 
Describe how the odds ratio differs from the relative risk or risk ratio and why you would chose it here. 
Relative risk (of bladder cancer by having less than1 per drink per week or vs. 4 or more per month) = 
(54/84)/ (115/137) = 0.77 
Therefore, we choose relative risk because it is much easier to interpret and makes much more sense to 
the layman. In this case, a relative risk of 0.77 means that the affected group has lesser risk of a non-
affected group (no bladder cancer).  
Interpret the odds ratio and how it might impact the practice of public health practitioners. 
An odds ratio (OR) is a measure of association between an exposure and an outcome. The OR represents 
the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome 
occurring in the absence of that exposure. In this case, the odds ratio means that the public health 
practitioners believe that people drinking alcohol one drink per week are likely to get less affected by 
the bladder cancer 
If you wanted to know whether this relationship was statistically significant what test(s) could you use? 
In this situation, we will conduct a chi-square test. This is because for such contingency tables, 
hypothesis testing is done using the Chi-square statistic in order to decide whether or not effects are 
present 
Chi-Square Tests 
 Value df Asymp. Sig. (2-
sided) 
Exact Sig. (2-
sided) 
Exact Sig.

Biostatistics Answer the following questions. Copy and paste any required data charts or summaries into this Word document. Include the file naming convention. I. Descriptive Statistics: Download the...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Record Number	AGE
1	3
2	11
3	15
4	46
5	14
6	35
7	46
8	35
9	40
10	29
11	22
12	16

*Alcohol consumption Bladder Cancer Crosstabulation**
Count
	Bladder Cancer	Total
No	Yes
Alcohol consumption	"Less than 1 drink per week"	30	54	84
4 or more drinks per month	22	115	137
Total	52	169	221

Model Summary^c
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate

1	.796^a	.634	.608	5.443
2	.792^b	.627	.607	5.445
a. Predictors: (Constant), Diabetes, Age, Gender, BMI b. Predictors: (Constant), Age, Gender, BMI c. Dependent Variable: SBP

ANOVA^c
Model	Sum of Squares	df	Mean Square	F	Sig.
1	Regression	XXXXXXXXXX	4	706.242	23.839	.000^a
Residual	XXXXXXXXXX	55	29.626
Total	XXXXXXXXXX	59
2	Regression	XXXXXXXXXX	3	931.407	31.418	.000^b
Residual	XXXXXXXXXX	56	29.646
Total	XXXXXXXXXX	59
a. Predictors: (Constant), Diabetes, Age, Gender, BMI b. Predictors: (Constant), Age, Gender, BMI c. Dependent Variable: SBP
Coefficients^a
Model	Standardized Coefficients	t	Sig.	95.0% Confidence Interval for B
Beta	Lower Bound	Upper Bound
1	(Constant)		8.092	.000	57.471	95.309
Gender	-.189	-2.100	.040	-6.381	-.149
BMI	.557	6.130	.000	1.213	2.392
Age	.507	6.067	.000	.426	.847
Diabetes	-.089	-1.019	.313	-4.752	1.549
2	(Constant)		8.885	.000	55.243	87.407
Gender	-.173	-1.950	.056	-6.054	.081
BMI	.574	6.413	.000	1.276	2.436
Age	.517	6.243	.000	.441	.859

a. Dependent Variable: SBP

Model Fitting Information
Model	Model Fitting Criteria	Likelihood Ratio Tests
-2 Log Likelihood	Chi-Square	df	Sig.
Intercept Only	74.995
Final	16.398	58.598	5	.000

Pseudo R-Square
Cox and Snell	.623
Nagelkerke	.866


McFadden	.767
Parameter Estimates
Heart Attack^a	B	Std. Error	Wald	df	Sig.	Exp(B)
No	Intercept	115.037	43.679	6.936	1	.008
BMI	-1.400	.572	5.995	1	.014	.247
Age	.037	.116	.099	1	.753	1.037
Diabetes	.811	1.471	.304	1	.581	2.251
SBP	-.469	.213	4.849	1	.028	.626
[Gender=1]	-11.866	4.695	6.389	1	.011	7.025E-6
[Gender=2]	0^b	.	.	0	.	.