Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

DEPARTMENT OF ECONOMICS ECON 4041H – RESEARCH METHODOLOGY Winter 2023, Peterborough Assignment #1 Due date: January 31, 2023 Instructions: You must provide your own unique solution. You may work...

1 answer below »
DEPARTMENT OF ECONOMICS
ECON 4041H – RESEARCH METHODOLOGY
Winter 2023, Pete
orough
Assignment #1
Due date: January 31, 2023
Instructions: You must provide your own unique solution. You may work with others, but each
of you is responsible for submitting your own problem set solution. Question values
are listed for each question. Submit solution through SafeAssign. Ideally you will
submit your RMarkdown file, preferably in pdf format. Blackboard won’t accept
html files, so if submitting an html file, first zip it and submit the zipped version.
But if you don’t like using RMarkdown, you may submit two files: your command
file and a wordprocessor file containing results, comments and answers to questions,
as well as graphs. Please bind all output together in one document file rather than
submitting separate files for each question, or for each graph. Your command file
will be a separate file.
For questions 1–5 use the labour force survey file lfs7797.rds. For question 6, use the 2016 Census
PUMF cen16.rds.
1. Some basic data descriptions of datafile lfs7797.rds [15 marks]
a. number of observations in the dataframe
. number of observations for variable cowmain–class of worke
c. number of missing observations for variable cowmain
d. mean wage (hrlyearn) of workers of variable cowmain category:
i. “Public employee”
ii. “Private employee”
e. mean wage (hrlyearn) of workers of variable union category:
i. “Union + agreement”
ii. “Agreement,no union”
2. Distribution of hrlyearn (wage rate), and uhrsmain (usual weekly hours) [15 marks]
a. summary statistics: find mean, median, maximum, minimum, standard deviation of
wage rate and weekly hours
. plot the densities of
i. wage rate
ii. log of wage rate
iii. usual weekly hours
iv. log of usual weekly hours
3. Generate some 2x2 tables of several variables [15 marks]
ECON 4041H - Assignment 1
a. first recode the variables for educational attainment: ed76to89 and educ90, the first is
for years prior to 1990, and the second is 1990 on. Recode to create one variable fo
oth years and call it educ
i. ed76to89
• “0 to 8 years” and “9-10 yrs schooling”: code as “less than high school”
• “11-13 years schooling” and “Some post secondary”: “high school”
• “Post secondary certificate of diploma”: “college” (note: keep spelling e
or)
• “University degree”: “university”
ii. educ90
• “0 to 8 years” and “Some secondary”: “less than high school”
• “Grade 11 to 13,grad” and “Some post secondary”: “high school”
• “College diploma”: “college”
• “Bachelors degree” or “Graduate degree”: “university”
. now calculate the following conditional means
i. mean hourly earnings by sex
ii. mean hourly earnings by educational attainment
iii. mean weekly hours by sex
iv. mean weekly hours by educational attainment
4. Composition of labour force by year: 1977 and 1997 [15 marks]
a. by sex (sex)
. by educational attainment (use variable created in previous question)
c. by age (use variable age_12)
Use the variable lfsstat (labour force status) to subset the labour force. Remember from
macro that the labour force is composed of those employed plus those unemployed.
5. Test the central limit theorem, as we did in our demo example. You will draw repeated sam-
ples of two variables hrlyearn–wages, and uhrsmain-usual weekly hours worked, saving the
mean value of each sample. Then compare the means, standard deviations and distribution
of the three samples to the “population” statistics.
Note, the data are in a dataframe, so you must either extract each variable as a vector, o
make sure you set your command for a dataframe. In order to replicate results, you will
need to set a seed value. The seed value determines a starting point for the random numbe
generator. To set your seed value, take your sid, drop the leading 0, then take the sum of the
next three digits and the last three. For example, if my sid is XXXXXXXXXX, I would calculate my
seed value as XXXXXXXXXX = 579. Then draw the random sample following the example in the
Sampling Distribution exercise. [20 marks]
a. Draw a sample of 1,000 observations of wages (hrlyearn). Save the mean value. Repeat
this for 2,000 repetitions. This yields 2,000 sample means. Then repeat for 5,000
observations, and again for 10,000 observations. This will give you three sets of 2,000
means. Report the mean, standard deviation, and graph the kernel density for each of
these three sets.
. What do you see as you increase the sample size? Compare your results—mean, stan-
dard deviation, density plot—with those of the aggregate sample.
2
ECON 4041H - Assignment 1
c. Repeat parts a. and b. above, but use the weekly hours variable (uhrsmain).
6. Use the Census 2016 PUMF (cen16.rds) to test whether the relationship between age (facto
variable agegrp) and employment income (variable empin) is linear. Restrict your analysis to
those in the age range from 20 to 84 years old. The variable agegrp for this range consists of
5-year age groups. Generate a numeric version of this variable and use the numeric variable
ather than the factor variable where appropriate. [20 marks]
a. generate a scatter plot with employment income on the y-axis and (the numeric version
of) age on the x-axis. Use a subset of the census file including only 50,000 observations.
The generated plot will otherwise take up a lot of space in your output file.
. generate a loess plot of employment income as a function of (the numeric version of)
age. Use a subset of the census file including only 50,000 observations. This command
is otherwise very slow. In specifying the loess plot command, make sure to include the
option “se = FALSE”, otherwise the estimation is very slow, even on the subset.
c. Run a regression of employment income on the numeric version of age. Report the
esults and interpret. What do they mean?
d. Run a regression of employment income on original factor variable version of age.
i. Report the results and interpret. What do they mean? Do they tell you anything
about whether the relationship is linear?
ii. Using the output from the regression above, test the significance of power terms of
the age variable using the contrast() command.
iii. Generate a plot of the predicted values of employment income for each level of the
age factor variable. Interpret.
3
Answered Same Day Feb 07, 2023

Solution

Mukesh answered on Feb 07 2023
34 Votes
cen= file.choose()
cen= readRDS(cen)
summary(cen$agegrp)
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 31576 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and over NA’s
## 13528 9139
table(cen$agegrp)
##
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 31576 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and ove
## 13528
The following code shows how to convert one categorical variable
in a data frame to a numeric variable:
li
ary(dplyr)
##
## Attaching package: ’dplyr’
## The following objects are masked from ’package:stats’:
##
## filter, lag
## The following objects are masked from ’package:base’:
##
## intersect, setdiff, setequal, union
1
cen$age_num= unclass(cen$agegrp)
unique(cen$age_num)
## [1] 11 5 2 12 15 19 18 14 8 1 16 4 6 9 NA 20 10 3 13 7 17 21
subset= cen %>% filter(age_num > 7, age_num <= 20) %>% sample_n(50000)
summary(subset$agegrp)
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 0 0 0 0
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 0 0 0 4189
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 4376 4470 4376 4293
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 4468 5137 5055 4322
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 3649 2580 1895 1190
## 85 years and ove
## 0
dim(subset)
## [1] 50000 90
dim(cen)
## [1] 930421 90
summary(subset$empin)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
## -50000 15000 37000 49236 65000...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here