1
Assignment 2
EC655 – Fall 2021
Due: Monday, November 8, 9:00pm
Assignment Description
In this assignment you are asked to manipulate data, estimate statistical relationships, and
interpret the findings. The questions below guide you through the process of statistical estimation.
It will be useful for you to use the “help” function in Stata, and/or to look up the command in the
Stata reference manuals (which are available within Stata as PDFs), or Google, or back in the
lecture notes. You are also, as always, welcome to ask me for help.
I strongly suggest that you start this assignment early because it will not be possible (in my
opinion) to do well if you start close to the due date. There are parts that you may find difficult;
you will want to identify them and leave enough time to ask questions if necessary.
Assignment Instructions
Data analysis
In mylearningspace, you will find a dofile called “EC655 assign2 dofile.do” that contains some code
you need to complete the assignment. All students must use this file to generate their data and
write the code to answer the questions. Before using the dofile, you need to do the following:
- Rename the file from “EC655 assign2 dofile.do” to your family name followed by your
student number (no spaces)
- After log using, replace [INSERT YOUR LAST NAME AND STUDENT NUMBER HERE]
with your last name and student number, with no space between the two. Do not
emove the quotation marks
- After set seed, replace [INSERT YOUR STUDENT NUMBER HERE] with your full student
number.
Leave all other commands that cu
ently exist in the dofile untouched. Write your code to answer
the questions inside the dofile.
Note that this file generates random data depending on your student number, so each student’s
data will be different, and therefore answers will also be different.
2
Submission
You are required to submit three documents according to the following instructions:
a) A report containing your answers to all the questions. I outline below how I would like
your report to look. The overall goal is that the answers to each question must be easily
identifiable in a readable, professional-looking document. Hand in the report in
Gradescope and the MyLearningspace Dropbox;
) Stata dofile. Hand in to the MyLearningspace Dropbox only;
c) Stata log file. Hand in to the MyLearningspace Dropbox only.
In the report described in (a) above, please answer all questions in the same order as they are
stated on the question sheet. For each question and sub-question, include the relevant Stata code
(if any) that you used, the output generated by that command if there was any, and an
interpretation if you are asked to provide it. For example, if you were answering the following
hypothetical question, it might look like this:
************************************************************************************************
1) Using the tab command, provide a frequency distribution for y
Stata commands:
tab y;
Output:
XXXXXXXXXXy | Freq. Percent XXXXXXXXXXCum.
------------+-----------------------------------
XXXXXXXXXX | 23, XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX | 138, XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX | 9, XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX | 63, XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX | 2, XXXXXXXXXX XXXXXXXXXX
------------+-----------------------------------
Total | 237, XXXXXXXXXX
************************************************************************************************
You could also format your own output tables rather than copying and pasting Stata output from
your log file if you find it easier. The key is that as long as the questions are answered in order,
and the Stata commands used for each question and associated output are clear, it will be fine.
A note on plagiarism: this is an independent assignment, which I expect you to complete on
your own. It is plagiarism to copy someone else’s work ve
atim, which includes Stata
dofiles. Any work you submit should be yours only.
3
Ec655 assign2 dofile.do creates simulated data on wages ($ per hour), years of completed
schooling, and years of experience, for 3000 observations. The data are fake, but are meant to
oughly match a real US dataset, and so the conclusions you reach here will be at least
somewhat connected to reality. Each question is worth 5 points, for a total of 40.
1) Explain the data generating process given in the code, highlighting in particular any implied
assumptions.
2) Create the simulated data using the provided code and draw a contour plot of your data for
wage, education, and experience (see the twoway contour command in Stata). Interpret your
findings.
3) Simulate the bias in the regression of wage on education (excluding experience). Use 1000
eplications in your simulations, and report the kernel density estimate of the bias and the
mean of the bias. Explain your results fully.
4) Create a new instrumental variable called z that meets the two criteria for a valid instrument.
Verify that the two conditions are met.
5) Estimate the relationship between wages and education by two-stage least squares, using z as
an instrument for education. How does the estimated slope from this regression compare to
the true causal effect and to the biased estimate from (3)?
6) Estimate the first stage regression of education on z, and save the residuals. Then estimate the
egression of wages on education and those residuals. Explain intuitively why the slope on
education is the same as in (5).
7) Simulate the sampling distribution of the two-stage least squares estimator in (6). Use 1000
eplications in your simulations, and report the kernel density estimate of the distribution of
the slope and the mean of the slopes. Explain your results fully.
8) Create 100 new instruments (call them w1 – w100) that are each a normally-distributed
andom variable with mean zero and standard deviation 10. Estimate the slope on education
y two-stage least squares using all one-hundred instruments in the same first stage. Explain
your results.
***EC655 Fall 2021
***Assignment 2 Dofile
cap log close
log using "[INSERT LAST NAME AND STUDENT NUMBER HERE].log", replace
clear all
set obs 3000
set seed [INSERT STUDENT NUMBER HERE]
gen educ = round(rnormal(13,3),1)
gen exper = 22 - 1*educ + rpoisson(3)
gen u = exp(rnormal(0,ln(2)))
egen mu = mean(u)
replace u = u-mu
gen wage = XXXXXXXXXX*educ + 0.25*exper + u
Last name/Family Name: Wang
Student Number: XXXXXXXXXX