Assignment 4 ‘Your name and ID here Fall 2022 Instructions Please read each question and answer where appropriate. The assignment is graded on a scale from 1 to 5. | grade effort as well as...

1 answer below »

Assignment 4
‘Your name and ID here
Fall 2022
Instructions
Please read each question and answer where appropriate. The assignment is graded on a scale from 1 to 5. | grade effort as well as content. That
means to obtain a 5, every question must be attempted, and | am a kind grader if the effort was high, but the result was not quite right.
After you answer the questions, knit the document to HTML and submit on eclass | will only grade HTML. If you submit the md file instead, you
will receive a zero. You have been warned, so there will be no exceptions.
Groups of up to four are allowed, but every student must submit their own assignment.
If an interpretation of output is asked for, but only output or code is given, the question will get zero points
Question 1: polynomails and interactions
This question uses the Wooldridge data set wage! . | have loaded it below to a data frame called waget . The purpose of ths question is to get
used to interpreting coefficients with different functional form assumptions.
wage! <- wooldridge:wage1 %>%
filter(complete.cases()
Consider the generic regression:
2
Y= Bot Bix + Boxy + Pao + Pas + slip Xo) + U
Here, the variable x, is entered into the regression twice — once as a ‘main’ effect and once as a squared term. Consider the partial effect of
increasing x, by 1 unit.

By+ Boxy
This says that the marginal impact on y from a one unit increase in x, is not a constant, but a function. The impact depends on the value of x;. Fo
example, suppose that fy is positive and , is negative. This means that the relationship between y and x, exhibits decreasing marginal returns.
»
Thats, when x, increases from 0 to 1, the * is higher than going from 20 3 and so on. If we graph this, it would look like an inverted ‘U'. Fo
example, it might look like:
fig.data <- data frame(x = 1:10) %>%
mutate(y = XXXXXXXXXX * x XXXXXXXXXX)

ggplotifig data, aesly =y, x = X)) +
geom_line) +
labs(ttle = “Typical diminishing marginal returms profile’)

Typical diminishing marginal returns profile
160+
150+
=
140
130-
120-
2s 50 7s 160
x
¥

We can find the the point at which x_1 turns from positive to negative (the inflection point) but setting * = 0 and solving for x;. This yields

The variables x, and x; also appear twice; each as an main effect and then as an interaction. Consider the partial effect of x, on y
yy
a.
Ee Bye fie
Again, this says that the impact of x, on ys not a constant. It allows the impact to depend on the value of x. The treatment of xy is symmetric. We
most often use these types of interactions when one term is a dummy variable. Suppose x, only takes on two values, 1 and 0. Then
yy yy
3 3
2 _ p+ fs when x, = 1 and 2 = , when x; = 0
Since dummy variables denote groups (ie, 2 groups), this allows each group to have its own intercept (B,) and slope. Graphically, it looks like:
gaplot(wage1 %>% fiter(educs5), aes(y = wage, x = educ, color = factor(west))) +
‘geom_smooth(method = Im’, se = F)

##°geom_smooth()® using formula’y ~ x'
25-
20-
” factor(west)
g —o
= 45- =
10+
educ
Where each line is a regression of log wages on education, with an interaction for living in the west. The return education for workers in the western
United States in this data is higher than the return for those in the rest of the country.
In R, we can create variables “on the fly" to use in regressions. We use this mostly to create interaction terms and low order polynomial terms.
Consider the following code that would estimate the following equation
2
Y= Bot Bix + Boxy + Pao + Pas + Psp Xs) + U
Inthe code below, each regression is exactly the same, just different ways of expressing it:
mod <-Im(y ~ x_1 + 1(x_1"2) +x 2" x_3, data = data frame)
mod <- Im(y ~ poly(x1,2, raw=T) +x 2" x_3, data = data frame)
mod <-Im(y ~ x_1 + I(<_1%2) + x_2 +x_3 + x_2x_3, data = data frame)
mod <-Im(y ~ x_1 + I(x_1%2) +x 2 +x_3 + Iix_2'_3), data = data frame)
The term 1) is an “insulator function’. It tells R to evaluate the expression inside first, then run the regression. The notation for x 2'x 3 says to
include main effects for each variable, plus and interaction. The notation x 2x 3 just includes an interaction. Finally, poly() constructs low orde
polynomials. The raw=T option is important.

wage! <- wooldridge:wage1 %>%
filter(complete.cases())

models <- list(
Im(iwage ~ educ + exper + Iiexper"2) + nonwhite + female , data = wage1),
Im(iwage ~ educ*female + exper + I(exper"2) + nonwhite , data = wage1)
)

#table
modelsummary(models,
fmt=5,
statistios_overtide = sandwich,
stars=T,
gof_omit = "["R2IAd]. R2") %>%

kable_classic_2()

Model 1 Model 2
(intercept XXXXXXXXXX" XXXXXXXXXX"
XXXXXXXXXX)
educ XXXXXXXXXX" XXXXXXXXXX"
XXXXXXXXXX)
exper XXXXXXXXXX" XXXXXXXXXX"
XXXXXXXXXX)
Ilexper' XXXXXXXXXX" XXXXXXXXXX"
XXXXXXXXXX)
nonwhite XXXXXXXXXX
XXXXXXXXXX)
female XXXXXXXXXX" XXXXXXXXXX+
XXXXXXXXXX)
educ x female XXXXXXXXXX
XXXXXXXXXX)
R XXXXXXXXXX
R2 Adj XXXXXXXXXX

3p<01,°p<005 "p<001, p<0001

1. In the first column, interpret the return to experience ( exper). After how many years of experience does the relationship tum negative?
Answer here
2. In column two, what is the return to education for men and women. Are the returns to education significantly different for men and women?
Answer here
Question 2: Teaching evaluations
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of
these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the
influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article tiled, “Beauty in the classroom:
instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005) found that instructors who are viewed to be bette
looking receive higher instructional ratings.
Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative
pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-
376, ISSN XXXXXXXXXX, XXXXXXXXXX/j.econedurev XXXXXXXXXX.
Paper link - not required to read
data("TeachingRatings") # load ratings data
df <- TeachingRatings # re-name as df for convenience

1. The data set df constructed in the above code chunk contains different types of variables. Use the command str() or glimpse() on the data
frame df to answer below:
a. What type of variable is credits ? What fraction of the data are single credit courses?
What type of variable is alstudents ? What is the largest class in the data set?
c. Construct a variable called frac that is the proportion of students in the class that filed out the evaluation. What is the average participation
ate?
Answer here
2. You can see the variable definitions by typing *?TeachingRatings” in the console. Suppose we are interested in estimating a causal effect of
eauty on eval . That is,
eval

+ Bubeauty+
Using the strategy discussed in class and in Chapter 7.6, construct a regression table evaluating the causal effect of beauty on teaching
evaluations. Your regression table should consider several specifications, starting with the bivariate regression above and then adding more
controls, possibly in groups. For each specification, state why you think its important to include for the controls you add. Your answer should
elate to the CIA assumption. Interpret your results, do you think that beauty has a causal impact on evaluations. If yes, defend your answer.
If not, state why not.
Answer here
# regression table here
3. Run a regression of eval on beauty, gender, minority , credits, division, tenure , native . Consider my data: | am male, non-minority,
native English speaker, teaching muliple credit courses in an upper-division and | have tenure. While | don't have a beauty rating,
according to RatelyProfessor.com, | have an evaluation of 2.3. Use your regression and my information to infer my what my beauty rating
would be if | were in this data set.
Answer here
# Regression here
4. In your regression you ran in part (3), the coefficient on gender shows that women have on average, after controlling for other characteristics,
lower evaluations than men. This has lead to additional research on the topic — evaluations are important for promotion and tenure decisions.
Add an interaction term between beauty and gender . Interpret your results: is the marginal impact of beauty the same for men and women?
Are good-looking men treated differently from good-looking women by students in terms of their evaluations? Can we reject that the return to
eauty for women, in terms of evaluations, is zero?
Answer here
# Regression here
5. Using the same controls in part (3), test that the return to beauty depends on the level of beauty. What do you find?
Answer here
# Regression here
6. Using your regression in part (5), allow the beauty profile to depend on gender. Can you reject that men and women have the same beauty
profile? Using the margins command to estimate the effect of moving from the 25th percentile to the 75th percentile for men and women.
What do you find?
Answer here
# Regression here
Question 3: Birth weight
Smoking during pregnancy has been shown to have significant adverse health effects for new born babies. Smoking is thought to be a preventable
cause of low birth weight of infants who in turn, need more resources at delivery and are more likely to have related health problems in infancy and
eyond. Despite these concerns, many women still smoke during pregnancy. In this section, we analyze the relationship between birth weight and
smoking behavior, with the emphasis on identifying a causalimpact of smoking on the birth weight of newborns.
The relationship we examine is:
log(birth weight); = f+ Bysmoking,+ n;
where smoking; will be measured by average cigarettes per day. The term 7 captures all of the other things that determine birth weight aside from
smoking.
Baseline analysis.
Investigate the birth weight-smoking relationship and present your results in a table format. Your investigation should be structured around the
discussion of section 7.6. For control variables, choose the ones you see fit and explain why you choose them. Your explanation should be
centered on our class discussion of the conditional independence assumption. You can see the help file for the data set by typing ?bwght in the
console. Remember, good controls are related to the treatment or target variable of interest and not affected by the treatment itseft
#loading birth weight data from the package woold
w <- wooldridge: bwght

Robustness of your results
Investigate any potentially non-linearity of your results. First, test whether the relationship between smoking and birth weight is linear by including a
polynomial in cigarettes per day. Second, examine whether the impact of smoking is the same for girls and boys. Your results should be presented
in a table format and structured along the lines of the discussion in Chapter 8.4 of the text. The average number of cigarettes smoked per day fo
smokers is about 14. Using the margins command, estimate the impact of reducing this by half and compare this effect to quitting al together.
# Regressions here
Assessing your results
Estimating the causal relationship between birth weight and smoking is made difficult by the fact that smoking might be co
elated with othe
ehaviors that are harmful to newborn outcomes. In other words, there are threats to internal validity. There are various types of threats to
intemal validity. For each one I list below, explain how this might affect the interpretation of your results above and whether or not you can address
the concern:
1. Omitted variables bias. Explain how this would affect the interpretation of the estimated coefficient on smoking. Given an example of
potential omitted variable you would control for if it were available in the data.
2. Model misspecification. For example, the relationship between birth weight and smoking is not linear. Should we be wo
ied about this in this
particular case?
3. Measurement e
or or e
ors-in-variables: If mothers in the survey did not accurately report their smoking behavior, how would this affect the
interpretation of your results? Should we be wo
ied about this in this particular case?
4. Simultaneous causality.
External validity.
Your analysis above provides some evidence on the relationship between birth weight and smoking behavior, but no one study is perfect. In this
case, there are two looming concerns: (1) omitted variables bias and (2) whether the results are generalizable. In the first case, there are usually
additional things we'd like to control for but can't because of data limitations. In the second case, we wo
y that our results might depend on a
particular sample. For these reasons, itis a good idea to examine other possible

04assignmentf22-ebngvzsf.pdf 04assignmentf22-2vono4jx-hh5t3ma4.html 04assignmentf22-pp1sxm4m-5eg11aog.jpg

Answered Same Day Dec 07, 2022

Solution

Mohd answered on Dec 07 2022

53 Votes

SOLUTION.PDF

Assignment 4 ‘Your name and ID here Fall 2022 Instructions Please read each question and answer where appropriate. The assignment is graded on a scale from 1 to 5. | grade effort as well as...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment