Assignment 06_EN Introduction to Regression Analysis
T 06
MSc Business Administration
Research Methods II
Quantitative Research Methods
Applied Data Analysis (with SPSS)
Tasks Assignment 06: Introduction to Regression Analysis
Prof. Dr. Jürg Schwarz Carlota de Miquel
Dr. Heidi Bruderer Enzler Viviane Pfluger
April 2018 XXXXXXXXXX
Task 01: Conducting a Regression Analysis with SPSS _________________________ 1
Task ............................................................................................................................ XXXXXXXXXX1
Syntax ......................................................................................................................... XXXXXXXXXX1
Regression Analysis ................................................................................................... XXXXXXXXXX1
Conclusion .................................................................................................................. XXXXXXXXXX3
Additional Information ............................................................................................... XXXXXXXXXX4
Task 02: A Logarithmic Transformation ____________________________________ 5
Divertimento ___________________________________________________________ 6
Appendix ______________________________________________________________ 7
How to Draw a Horizontal Line in a Scatter Plot ....................................................... XXXXXXXXXX7
Interpretation in Case of Logarithmized Dependent Variables .................................. XXXXXXXXXX7
Additional Information ............................................................................................... XXXXXXXXXX8
Important Note
In practice, your first step would always be a descriptive analysis:
• You describe the data in terms of central tendency, variability and distribution
using tables, statistics and figures.
• You search for e
ors, implausible values and other peculiarities and take notes for later use.
You only run more complex analyses once you know your data in detail and have co
ected any mistakes.
In the framework of the assignments in this module, you may skip the above-mentioned descriptive ana-
lyses unless you are explicitly instructed to ca
y them out. This omission of important steps is intended
to support your concentration on the cu
ent topics of each of the lectures.
Assignment 06 Introduction to Regression Analysis, p. 1
Task 01: Conducting a Regression Analysis with SPSS
Task
Data set: Data06a.sav
Is there a relationship between education and income? To answer this question a survey of 40 employees
(n = 40) was conducted.
The dataset includes the following variables:
- income = annual income [USD]
- education = education [years of schooling]
Conduct a regression analysis predicting income based on education.
What do you notice? Describe the output using typical statistical statements.
Do not forget to examine the assumptions of regression analysis.
Syntax
GRAPH
/SCATTERPLOT(BIVAR)=education WITH income
/MISSING=LISTWISE.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT income
/METHOD=ENTER education
/SCATTERPLOT=(*ZRESID ,*ZPRED)
/RESIDUALS HISTOGRAM(ZRESID).
Regression Analysis
Step 1: Formulation of the model
A model is postulated, here:
income = β0 + β1*education.
But is a linear relationship between income and
education theoretically and empirically plausible?
The scatterplot illustrates clearly that the two variables
are related. However, the relationship does not look completely linear. It appears that with increasing
education, income is increasing more rapidly. Furthermore, the variance in income appears to be
increasing with education.
→ This may indicate a problem and therefore should be kept in mind.
Assignment 06 Introduction to Regression Analysis, p. 2
Step 2: Estimation of the model (Done by SPSS, no comment needed)
Step 3: Verification of the model
The model as a whole is significant, F(1, 38) = XXXXXXXXXX, p = .000. Therefore, the analysis is continued.
The regression coefficients for education differs significantly from zero (βeducation = XXXXXXXXXX, t = 10.885,
p = XXXXXXXXXXThe same is true for the constant (the intercept, β0, t = XXXXXXXXXXp = .042).
The regression equation is: income = XXXXXXXXXX697 * education
Model fit: 75.13% of the variance of income is explained in terms of education (R2adj. = .751).
Step 4: Considering other aspects (only relevant in multivariate regression analysis)
Step 5: Testing of assumptions
Are the prerequisites of the Gauss-Markov theorem (1-5) met? How about the other assumptions?
Assumption 1: The model postulated in Step 1 is linear in parameters
(income = β0 + β1*education).
Assumption 2: The sample may be random but there is no information regarding this.
Assignment 06 Introduction to Regression Analysis, p. 3
Assumption 3: Zero conditional mean: This is difficult to determine visually. The mean value of the
esiduals may not be 0 for all predicted values of income, but it is not far off 0 either.
Assumption 4: The initial scatterplot reveals that education is not constant but indeed shows variance.
→ ok.
Assumption 5: The scatterplot indicates that the assumption of homoscedasticity is violated. The
esiduals do not have a constant variance. In fact, they deviate more strongly from 0 for
high predicted values of income than for low values. → problematic. (If there were
constant variance, the residuals would be randomly scattered around 0 for all predicted
values.)
Assumption of independence: There is no obvious pattern in the scatterplot that indicates that the
esiduals would be influencing one another (e.g. a wavelike pattern). → ok.
Assumption of normal distribution: The distribution of the standardized residuals is not fully normal but
also not very skewed (see histogram).
Step 6: Interpretation of the model
The regression coefficient of education indicates the following: If education increases by one year (one
unit), the annual income increases by XXXXXXXXXXUSD XXXXXXXXXXunits).1
Conclusion
The regression analysis reveals a significant