Introduction to Data Analytics - CJUS 4370
inal Exam
Name: ________________________________________
Each short answer question below can be answered with a few sentences. I strongly encourage you to think more than you write. Show ALL math whenever computations are being made or you will not receive full credit.
Computational Questions
Table 1 below provides slopes, standard e
ors, and beta’s for several variables used to predict levels of self-control (higher scores = lower self-control) for respondents living in “good” and “bad” neighborhoods.
Table 1. OLS Regression Predicting Low Self-Control Across Neighborhood Type
Good Neighborhoods
(n = 356)
Bad Neighborhoods
(n = 186)
Measure
B
SE
Beta
B
SE
Beta
Age (in Years)
-.215*
.094
-.117
.107
.160
.050
Sex (1 = Males)
.699**
.224
.160
.129
.361
.026
Race (1 = Whites)
.059
.261
.012
-.499
.382
-.101
Parental Supervision
(Higher = More Supervision)
-.277**
.023
-.063
-.026
.364
-.005
Parental Responsiveness
(Higher = More Responsive)
-.098**
.032
.161
.096
.058
.228
School Socialization
(Higher = More Socialization)
-.249**
.085
-.154
-.042
.112
-.027
Maternal Smoking
(1 = Smoked During Pregnancy)
.580*
.277
.109
1.02**
.388
.193
Constant
6.105
1.750
.460
2.620
R2
.313
.273
* p < .05; ** p < .01
1. Calculate the predicted self-control score for a person living in a good neighborhood who is 18 years old, male, who scores a 12 on parental supervision, a 15 on parental responsiveness, a 10 on school socialization, and whose mother smoked during pregnancy. Then, calculate the self-control score for a person living in a good neighborhood who is 21 years old, female, who scores an 8 on parental supervision, a 10 on parental responsiveness, a 6 on school socialization, and whose mother smoked during pregnancy. If having low self-control is a significant predictor of delinquency, which individual (#1 or #2) would be more at risk of engaging in delinquency?
2. What is the strongest independent variable for the model predicting self-control for respondents living in good neighborhoods? What is the strongest independent variable for the model predicting self-control for respondents living in bad neighborhoods?
3. In every situation above, the standard e
ors for the model predicting self-control for respondents in good neighborhoods are lower than the standard e
ors predicting self-control for respondents in bad neighborhoods. What might be one reason why this might occur?
4. A coefficient is significant when the t value associated with the coefficient exceeds the critical t value at the appropriate degrees of freedom. In table 1, calculate the t value for school socialization and maternal smoking variables (2 separate t values).
5. In the table below, let X be the number delinquent peers a respondent has and Y be the number of a
ests they have. Calculate the bivariate co
elation between X and Y and provide an interpretation of the co
elation coefficient documenting whether it is significant.
X
Y
2
1
3
1
3
3
5
3
6
3
6
5
7
6
8
8
8
9
10
9
6. Using the data from question #5, calculate the equation for an ordinary least squares regression (slope, Y-intercept, and r2). Interpret each of these 3 values.
7. Using the following data, calculate the partial co
elation between X and Z controlling for Y.
X Y Z
X 1
Y .21 1
Z .18 .09 1
Conceptual Questions
8. A colleague of yours is completing a final report on the causes of the frequency of cybe
ullying. In this report, she is asked to identify the causes that most strongly impacted the frequency of cybe
ullying. She conducts an OLS regression. What statistic do you advise her to use in her discussion? Why?
9. You are in the process of completing a study of the causes of cybe
ullying. You ran two OLS regression models to guide your analysis. In model 1, your adjusted R2 was .46 with 7 predictor variables. In model 2, your adjusted R2 was .33 with 9 predictor variables? Knowing nothing else, which model do you use when discussing your results? Why?
10. Your colleague produces the following two scatterplots. When comparing the two, what conclusion do you draw about the R2 for the scatterplot on the left compared to the R2 for the scatterplot on the right? Why did you come to the conclusion that you did?
11. What information is provided by calculations of a Pearson’s Co
elation Coefficient?
12. What information is provided by calculations of a Multivariate Ordinary Least Squares Regression?
13. Why would researchers believe that an ordinary least squares regression is superior to a partial co
elation coefficient?
14. Refe
ing back to Table 1, write one sentence that encapsulates the major finding of the data that are reported.