Outline for project documents:
You need to summarize the research question and analysis of regression/generalization results in section 1 (maximum 3 pages), present the results (cut of paste of tables and plots from the Excel results) in section 2, and generalization results (prediction, prediction e
ors, and MSE from all the test sample points) in section 3. The total number of pages of the whole report can be more than three pages based on your section 2 and 3. I give below the headings as I had discussed in the class.
Note: Strictly follow the suggested headings and the order of headings for section1
Cover page
Make sure that you have the names of all the group members.
Section 1 headings (analysis on results reported in sections 2 & 3):
Research question (RQ)
What is the RQ you are trying to find the answers for by running the regression?
Data source
Refer to the project guideline document.
Descriptive statistics
Report and analyze - mean/standard deviation/skewness/kurtosis/outliers. Report whether the data violated the normality assumption.
Co
elation analysis
Analyze and report two categories of co
elations I discussed in the class sessions.
Regression results
Adjusted R2, F-stat, p-value, estimates of IDV coefficients, regression equation, t-tests for identification of significant IDVs, residual analysis plots, and, normality plot
Generalization results on test sample
Analyze the results comparing MSEcali
ation and MSEtest.
Conclusion about Implementation
Conclude whether the model is implementable by summarizing the regression and the generalization results in a paragraph
Section 2:
Present all relevant results that you analyze in section 1 from your regression run on cali
ation file.
Section 3:
Present all the test samples with relevant columns for calculating MSEtest.
Note: You need to present all the sections in one word file and upload and also
ing one copy of printout of the word document in the class.
Project:
Regression Project Guidelines
Data source:
SPSS survival manual 4e, 2010 by Julie Pallant. You need to refer to this under heading labeled “data source” in your report.
The author obtained the scaled scores of the following variables from an experiment in psychology through a set of survey questions answered by the respondents.
Dependent variable (DV):
tlifesat:
total life satisfaction
Independent variable (IDV):
toptim:
total optimism
tmast:
total mastery
tposaff:
total positive affect
tnegaff:
total negative affect
tpstress:
total perceived stress
sex:
1 if male; 2 if female
age:
in years
Note: You will apply the following guidelines and your class notes on the data given by your instructor.
· Split your data sample randomly as shown in the class – Cali
ation (80%) and Test (20%). Run descriptive tests on your regression dependent variable (DV). Identify outliers if any by “mean ± 3*standard deviation.” Keep the outliers for regressions.
· Use analysis of descriptive statistics to summarize the data. Comment on the findings.
· Develop an estimated regression equation (regression model) and use that to predict your DV in the test sample. Identify which independent variables are statistically significant. Use variable names from the header row in the data file to write the regression equation.
Due: Fe
uary 27, 2022 by 11:59pm
Note: You can submit the word document online and
ing a printout to the class.
Cali
ation Sample:
· Run co
elation on all variables (except sex) on the cali
ation sample. Analyze.
· Run regression on the cali
ation sample (include sex and all other IDVs).
· Write your model equation. Report adjusted R2 and other relevant statistics as discussed in the class.
Test Sample:
· Predict your DV values on test sample using the regression model from cali
ation.
· Report Generalization mean squared e
ors (prediction accuracy).
Analytical Report:
The report will consist of three sections in one single word file:
1. Cover page with all group member’s name
2. (Section 1) Summarize results on three single-spaced pages (maximum).
As discussed in the class, summary pages should contain
ief descriptions of the following: Problem description or research question, data sources, DV descriptive statistics, co
elation, normality assumptions, regression model equation, model estimation and fit statistics, normality plot, residual plots, model generalization, conclusion and recommendation.
3. (Section 2) Cut and paste your results and plots analyzed in section 1 from your Excel file.
4. (Section 3) Data columns for all the regression variables in the test file only, ID, random number, forecast of DV, e
or and squared e
or. The MSE test number needs to be there. Do not keep any extra columns that you may have generated to do the project.
Thresholds:
You will use the thresholds for co
elation numbers (weak/moderate/strong) as discussed in the class. You will use ± 1 from 0 to report about your skewness and kurtosis statistics. You will use 0.7 for determination of potential multi-collinearity problem. For outlier detection of the DV you will use “mean ± 3*standard deviation” method. For co
elation (could be positive or negative) you will use the rule: absolute value between 0 and 0.3 – weak co
elation, between 0.3 and 0.5 moderate co
elation and greater than 0.5 strong co
elation.
Model Generalization:
To test the generalization power of your model, you need to split randomly the sample into 80%/20%. You will use 80% data set to run the linear regression. After you run the regression, you will get a model equation. You will also get an estimate of mean squared e
or (MSE of cali
ation data). Use the model equation to predict your DV in the test data. Note that your test data already has the actual DV value for each test observation. Compute the test MSE from the actual DV and the predicted DV. If the two MSEs (one from the regression output and one from the test data) are close (i.e. MSEtest is not more than 1.5*MSEcali
ation) your model is generalizing. Your task is to report co
ectly the two MSE numbers and conclude whether the model is generalizing or not. Based on your overall analysis report whether the model is implementable or not.
Grading
Report:
100 points – Everything discussed in the class should be in the report
Note:
The instructor will use the ru
ic template in the following page to grade the project.
Ru
ic Template:
Ru
ics Item
Type of Ru
ic scale
MP = maximum point
Lesson Level Learning (LLL Objectives
Scores for point ru
ic
0
1
2
Research Question (RQ)
3 point
MP=2
Recall hypothesis testing concepts
No RQ in the report
OK RQ
Good RQ
Data Source
3 point
MP=2
Learn not to plagiarize
No mention of data source
With mistakes
Co
ect
Data Splitting
Continuous
MP=10
Learn how to test before implementing a model
Descriptive Statistics
Continuous
MP=6
Recall characteristics of location/spread/skewness/kurtosis statistics
Co
elation Analysis
Continuous
MP=10
Recall the learning about how two variables co-vary
Regression Analysis and Results
Continuous
MP=60
demonstrate understanding of the following topics: contrast between categorical and continuous valued independent variables (IDV), contrast between dependent variables (DV) and IDV; apply understanding of research hypotheses to interpret regression results; examine hypothesis test results and model fit
Residual Analysis and normality plot
Continuous
MP=5
Understand whether homoscedasticity assumption and normality assumption are be held true in the data
Generalization & conclusion
Continuous
MP=5
draw inferences and find evidence to support generalization
surveydata
sex age tmast tposaff tnegaff tpstress toptim tlifesat
1 45 21 36 15 28 19 23
2 21 22 37 24 30 19 20
2 42 25 32 16 27 23 26
2 47 20 30 25 29 24 27
2 41 16 23 24 42 21 21
2 38 21 40 17 26 21 26
1 39 21 35 35 22 19 30
2 67 17 35 17 25 23 20
2 22 20 34 36 37 10 20
2 31 26 40 22 26 24 32
2 45 27 41 17 28 30 35
1 26 27 39 10 15 23 34
1 51 24 33 12 22 22 24
2 37 22 38 22 26 22 28
1 33 19 35 15 28 23 13
2 61 26 40 16 24 27 31
2 33 25 34 15 24 22 18
2 42 20 19 15 26 18 14
2 45 25 36 22 28 21 19
2 57 19 37 10 29 24 23
2 60 23 32 25 27 25 30
1 23 13 20 39 46 9 5
2 55 28 38 14 22 27 27
2 38 21 31 17 33 28 30
1 27 22 32 33 32 20 20
1 21 23 33 15 29 27 33
1 37 26 29 20 32 24 18
2 27 21 43 31 31 18 29
1 31 15 23 29 31 16 16
1 52 26 41 18 27 29 20
1 64 25 31 14 25 25 19
2 35 20 33 22 29 18 16
2 22 17 32 31 32 23 23
1 23 25 37 17 26 18 32
1 56 20 37 22 31 24 13
2 24 18 32 13 29 22 18
1 36 17 26 22 36 20 22
2 37 23 19 39 40 19 14
1 50 26 39 24 25 27 32
1 37 27 29 17 26 27 22
1 40 25 41 15 20 25 31
2 27 22 36 18 28 21 28
1 51 27 36 12 20 26 30
1 23 20 22 19 39 14 17
2 37 27 31 15 24 29 35
2 19 24 42 18 24 21 25
2 48 14 35 27 33 21 14
1 50 21 20 16 19 17 27
1 49 16 43 25 25 17 23
2 36 21 31 39 34 14 20
1 45 28 44 14 17 27 22
2 18 12 11 34 43 13 8
1 22 25 38 14 27 22 15
2 19 19 22 26 34 16 23
1 27 24 32 21 24 26 29
2 46 23 35 12 21 24 26
2 20 20 26 30 37 15 14
1 55 23 39 15 21 24 19
2 23 25 38 18 26 20 24
2 30 15 32 10 26 15 11
1 22 23 33 14 21 25 28
1 23 21 40 20 28 21 23
2 25 24 40 10 24 24 29
2 46 17 40 29 37 17 30
2 22 18 41 12 20 30 34
2 20 21 28 28 28 18 16
2 49 22 27 28 34 16 12
2 42 21 37 15 19 18 35
1 37 26 39 12 24 20 24
2 20 14 24 28 41 14 6
1 22 24 43 19 24 20 22
1 25 23 21 15 20 21 25
1 26 24 35 22 28 14 27
1 22 28 40 19 21 28 26
1 24 20 35 18 25 21 25
1 51 26 36 14 21 27 25
2 45 19 28 20 28 21 23
1 47 19 30 22 23 16 20
2 21 18 27 16 29 23 15
2 41 25 40 16 19 24 31
2 25 25 39 15 19 25 29
2 26 22 37 12 25 23 29
2 23 21 35 23 36 23 31
2 51 26 32 11 26 24 19
1 48 22 36 10 23 20 24
1 37 23 41 22 25 18 30
1 39 18 30 32 32 16 13
2 40 19 28 26 30 19 12
1 33 20 34 14 16 26 25
2 27 26 25 21 27 20 11
1 32 24 43 14 21 23 29
2 50 18 39 22 30 25 20
2 35 20 36 28 32 24 17
2 50 17 39 21 27 21 14
2 38 28 33 12 19 26 28
2 48 15 31 39 37 20 19
1 54 21 34 11 24 23 19
2 36 24 27 20 26 24 12
2 68 28 40 10 13 30 35
2 74 23 43 10 16 26 34
2 35 17 41 23 31 25 24
2 37 25 45 12 13 27 29
2 39 20 34 26 24 19 14
1 50 17 26 20 33 15 12
1 33 22 32 13 22 14 9
2 23 21 39 19 31 19 27
1 54 16 37 33 29 20 19
2 23 19 37 30 26 23 22
2 41 23 40 26 32 28 18
2 49 20 37 15 28 24 30
1 22 22 32 15 23 21 25
2 22 17 30 20 31 20 23
1 27 25 41 33 23 30 23
2 32 26 21 13 23 23 17
2 38 24 32 20 25 21 20
2 41 23 41 18 25 28 26
1 23 22 39 21 23 24 33
1 21 24 33 22 21 26 26
2 43 18 16 39 42 20 8
1 40 25 35 19 23 24 25
2 31 19 31 20 31 19 22
1 49 27 42 12 24 27 23
2 52 19 44 13 29 28 34
2 36 19 30 31 33 22 23
2 19 14 22 36 44 17 9
2 58 26 32 15 31 27 24
2 65 28 47 13 27 30 30
1 22 23 34 22 26 24 20
1 43 25 36 12 29 21 16
1 22 22 44 22 29 20 16
2 22 25 43 13 18 29 29
1 31 19 27 19 32 17 18
1 46 24 33 13 25 17 17
2 53 16 26 23 36 26 11
2 24 27 29 17 33 17 18
2 42 23 23 29 35 21 13
2 45 20 16 23 28 16 30
2 70 15 32 24 38 30 10
1 46 28 41 10 17 29 31
1 42 22 40 21 25 24 21
1 35 17 11 20 34 16 5
2 34 26 37 25 16 28 31
2 36 23 37 26 33 22 30
2 44 23 34 17 30 18 18
1 31 22 35 18 23 19 26
2 29 28 45 10 21 28 26
1 46 19 31 19 31 16 22
2 23 18 28 20 31 21 22
2 49 25 39 12 25 28 30
2 44 22 31 10 20 29 29
2 23 27 40 15 18 29 32
1 42 21 33 15 25 27 29
2 29 23 38 17 23 23 24
1 66 27 42 10 19 24 35
2 21 21 35 19 25 22 26
2 41 28 35 25 30 26 27
1 35 18 32 19 32 24 25
2 28 17 26 28 34 16 29
1 36 26 25 17 24 24 20
1 36 27 35 15 28 27 21
2 27 14 40 23 24 21 27
1 70 23 29 10 19 25 32
2 35 28 37 19 22 26 27
1 33 21 26 16 26 22 25
2 35 19 34 25 32 25 10
1 22 18 36 20 30 27 17
1 56 25 33 16 20 20 30
2 23 21 30 22 28 17 20
1 31 28 45 15 26 29 27
2 49 27 28 12 20 30 28
2 31 27 34 14 25 30 30
1 63 24 30 10 15 26 20
1 36 27 43 16 22 29 33
2 48 16 31 22 32 20 17
2 30 19 29 30 36 21 25
2 26 27 46 13 23 25 31
2 21 18 29 35 36 21 24
1 20 19 26 23 30 24 22
1 26 16 29 31 29 17 27
2 22 24 31 25 29 17 24
2 21 19 45 29 28 21 23
2 44 28 38 23 26 26 23
1 26 24 42 20 32 21 12
2 26 20 22 12 29 19 20
1 32 23 27 17 26 23 17
2 41 28 40 12 23 30 27
2 40 20 34 16 25 18 27
2 44 23 40 13 22 22 24
2 48 25 39 15 25 27 29
2 18 23 33 24 26 21 21
1 54 20 33 20 29 22 24
1 47 23 34 15 22 21 21
2 74 22 30 12 23 23 26
1 65 19 37 10 20 20 18
1 45 20 27 17 30 26 17
2 21 26 22 20 21 26 27
2 48 20 31 21 29 21 19
1 33 19 26 11 22 27 31
2 44 21 37 12 21 24 16
1 45 20 27 17 30 26 42