ECON4060H Assignment 4, machine learning
ECON4060H Assignment 4, machine learning 01 April 2022
20 points in total. Due by April 08 11:59PM Eastern Time. Computer programs must be submitted
in their original formats. DO NOT submit your code in pdf files.
This assignment asks you to train a linear regression model to predict hourly earnings of workers.
Hourly earnings should be in the natural logarithm. The training data and the test data are random
samples from the Labor Force Survey (LFS) 2021.
You can follow these steps:
1. Select the independent variables. If you know how, you can also let the machine learning algo‑
ithm to process and select the set of independent variables. All variables selected should have
proper values and be cleaned.
2. Estimate the model without regularization, and use the parameter estimates to predict using
the test sample data. Report the parameter estimates, and the MSE of prediction.
3. Train themodel. You candecide to use Ridge, Lasso, Elastic Net, or all of them for regularization.
Among themLasso is recommended. Instead, youcanalsochoose thebestmodelbycomparing
them.
4. Report the training results, including the cross‑validation MSE, the penalty parameter, etc.
5. Predict using the test sample data. Report the MSE of prediction.
Shutao Cao 1