DEPARTMENT OF ECONOMICSECON 4041H – RESEARCH METHODOLOGYWinter 2023, PeteoroughAssignment #1Due...

Question

DEPARTMENT OF ECONOMICSECON 4041H – RESEARCH METHODOLOGYWinter 2023, PeteoroughAssignment #1Due date: January 31, 2023Instructions: You must provide your own unique solution. You may work with others, but eachof you is responsible for submitting your own problem set solution. Question valuesare listed for each question. Submit solution through SafeAssign. Ideally you willsubmit your RMarkdown file, preferably in pdf format. Blackboard won’t accepthtml files, so if submitting an html file, first zip it and submit the zipped version.But if you don’t like using RMarkdown, you may submit two files: your commandfile and a wordprocessor file containing results, comments and answers to questions,as well as graphs. Please bind all output together in one document file rather thansubmitting separate files for each question, or for each graph. Your command filewill be a separate file.For questions 1–5 use the labour force survey file lfs7797.rds. For question 6, use the 2016 CensusPUMF cen16.rds.1. Some basic data descriptions of datafile lfs7797.rds [15 marks]a. number of observations in the dataframe. number of observations for variable cowmain–class of workec. number of missing observations for variable cowmaind. mean wage (hrlyearn) of workers of variable cowmain category:i. “Public employee”ii. “Private employee”e. mean wage (hrlyearn) of workers of variable union category:i. “Union + agreement”ii. “Agreement,no union”2. Distribution of hrlyearn (wage rate), and uhrsmain (usual weekly hours) [15 marks]a. summary statistics: find mean, median, maximum, minimum, standard deviation ofwage rate and weekly hours. plot the densities ofi. wage rateii. log of wage rateiii. usual weekly hoursiv. log of usual weekly hours3. Generate some 2x2 tables of several variables [15 marks]ECON 4041H - Assignment 1a. first recode the variables for educational attainment: ed76to89 and educ90, the first isfor years prior to 1990, and the second is 1990 on. Recode to create one variable footh years and call it educi. ed76to89• “0 to 8 years” and “9-10 yrs schooling”: code as “less than high school”• “11-13 years schooling” and “Some post secondary”: “high school”• “Post secondary certificate of diploma”: “college” (note: keep spelling eor)• “University degree”: “university”ii. educ90• “0 to 8 years” and “Some secondary”: “less than high school”• “Grade 11 to 13,grad” and “Some post secondary”: “high school”• “College diploma”: “college”• “Bachelors degree” or “Graduate degree”: “university”. now calculate the following conditional meansi. mean hourly earnings by sexii. mean hourly earnings by educational attainmentiii. mean weekly hours by sexiv. mean weekly hours by educational attainment4. Composition of labour force by year: 1977 and 1997 [15 marks]a. by sex (sex). by educational attainment (use variable created in previous question)c. by age (use variable age_12)Use the variable lfsstat (labour force status) to subset the labour force. Remember frommacro that the labour force is composed of those employed plus those unemployed.5. Test the central limit theorem, as we did in our demo example. You will draw repeated sam-ples of two variables hrlyearn–wages, and uhrsmain-usual weekly hours worked, saving themean value of each sample. Then compare the means, standard deviations and distributionof the three samples to the “population” statistics.Note, the data are in a dataframe, so you must either extract each variable as a vector, omake sure you set your command for a dataframe. In order to replicate results, you willneed to set a seed value. The seed value determines a starting point for the random numbegenerator. To set your seed value, take your sid, drop the leading 0, then take the sum of thenext three digits and the last three. For example, if my sid is XXXXXXXXXX, I would calculate myseed value as XXXXXXXXXX = 579. Then draw the random sample following the example in theSampling Distribution exercise. [20 marks]a. Draw a sample of 1,000 observations of wages (hrlyearn). Save the mean value. Repeatthis for 2,000 repetitions. This yields 2,000 sample means. Then repeat for 5,000observations, and again for 10,000 observations. This will give you three sets of 2,000means. Report the mean, standard deviation, and graph the kernel density for each ofthese three sets.. What do you see as you increase the sample size? Compare your results—mean, stan-dard deviation, density plot—with those of the aggregate sample.2ECON 4041H - Assignment 1c. Repeat parts a. and b. above, but use the weekly hours variable (uhrsmain).6. Use the Census 2016 PUMF (cen16.rds) to test whether the relationship between age (factovariable agegrp) and employment income (variable empin) is linear. Restrict your analysis tothose in the age range from 20 to 84 years old. The variable agegrp for this range consists of5-year age groups. Generate a numeric version of this variable and use the numeric variableather than the factor variable where appropriate. [20 marks]a. generate a scatter plot with employment income on the y-axis and (the numeric versionof) age on the x-axis. Use a subset of the census file including only 50,000 observations.The generated plot will otherwise take up a lot of space in your output file.. generate a loess plot of employment income as a function of (the numeric version of)age. Use a subset of the census file including only 50,000 observations. This commandis otherwise very slow. In specifying the loess plot command, make sure to include theoption “se = FALSE”, otherwise the estimation is very slow, even on the subset.c. Run a regression of employment income on the numeric version of age. Report theesults and interpret. What do they mean?d. Run a regression of employment income on original factor variable version of age.i. Report the results and interpret. What do they mean? Do they tell you anythingabout whether the relationship is linear?ii. Using the output from the regression above, test the significance of power terms ofthe age variable using the contrast() command.iii. Generate a plot of the predicted values of employment income for each level of theage factor variable. Interpret.3

Mukesh · Accepted Answer

cen= file.choose()
cen= readRDS(cen)
summary(cen$agegrp)
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 31576 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and over NA’s
## 13528 9139
table(cen$agegrp)
##
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 31576 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and over
## 13528
The following code shows how to convert one categorical variable
in a data frame to a numeric variable:

DEPARTMENT OF ECONOMICS ECON 4041H – RESEARCH METHODOLOGY Winter 2023, Peterborough Assignment #1 Due date: January 31, 2023 Instructions: You must provide your own unique solution. You may work...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment