PART A (38 MARKS)
For this part, you examine how the housing values are affected by a number of variables such as air pollution, features of house and other socio-economic factors using the data set (“House_price_Data.xls”) provided on the Blackboard. This data file contains the data on the following variables from 506 communities in Australia:
Hpricei = median house price in community i (Hprice is measured in $1,000)
NOxi = Amount of nitrogen oxides (NOx) in the air (measured in parts per million or PPM)
DISTi = distance of community i from the state capital (DIST is measured in miles)
ROOMSi = Average number of rooms per house in community i
STRatioi = Average student-teacher ratio of schools in community i
CRIMEi = Crime committed (measured per 100 residents in community i)
Using the data set provided in the Excel file (“House_price_Data.xls”), transform the required variables to estimate a multiple regression model in which the natural log of Hprice is regressed on the following variables:
· NOx,
· NOx squared,
· DIST,
· DIST squared,
· the natural log of ROOMS,
· STRatio, and
· CRIME.
Answer Questions 1-8 in Part A in reference to the multiple regression output that you have obtained.
Question 1: Provide an output from estimating the multiple regression model specified above and state the estimated sample regression equation. (4 marks)
(i) Provide the output from estimation results (3 marks)
(ii) State the estimated regression equation (1 mark)
Question 2: Interpret the estimated coefficients for INTERCEPT, lnROOMS and Nox in the context of the estimated regression model and comment
iefly on whether the estimated coefficients make sense. (2 marks each, 6 marks total)
(i) INTERCEPT
(ii) lnROOMS
(iii) Nox
Question 3: Following the steps provided below, test the hypothesis that the true population coefficient for DIST is negative at a 1% significance level. (4 marks total)
(i) State the null and alternative hypotheses. (1 mark)
(ii) Calculate the test statistics. (2 marks)
(iii) Obtain the relevant critical value and complete the test (that is, determine whether you should reject or not reject the null hypothesis). (1 mark)
Question 4: Construct a 99% confidence interval of the coefficient for CRIME and interpret this interval.
XXXXXXXXXX5 marks total)
(i) Provide the relevant critical value and construct the confidence interval manually using the estimated coefficient and standard e
or in your regression output (do not use an option available in Excel’s regression command to construct the confidence interval). (3 marks)
(ii) Interpret the confidence interval. (1 mark)
(iii) State what the confidence interval suggests about the significance of the coefficient for CRIME.
XXXXXXXXXXmark)
Question 5: Answer the following four questions regarding the relationship between lnHPrice and DIST as implied by the estimated regression model. (5 marks total)
(i) Sketch the implied relationship between lnHPrice and DIST in a two-dimensional diagram. Clearly label the horizontal and vertical axes. (1 mark)
(ii) Describe in words the implied relationship between lnHPrice and DIST. (1 mark)
(iii) Do the estimation results suggest that a quadratic relationship between lnHPrice and DIST is appropriate? Briefly describe a reasoning of your answer. (1 mark)
(iv) Find the level of DIST at which the marginal effect of increased DIST on lnHPrice changes its sign (that is, from positive to negative or negative to positive). State all relevant steps. (2 marks)
Question 6: Answer the following three questions regarding the F-statistic and Significance F figures obtained in the regression output. (1 mark each, 3 marks total)
(i) State the null and alternative hypotheses that can be tested by the reported F-statistic.
(ii) Provide the critical value to be used for testing the hypothesis in (i) at 5% significance level. Present the answer in at least 6 decimal point.
(iii) Based on the reported F-statistic, what would you conclude about the test of the hypothesis stated in part (i)?
Question 7: Answer the following two questions regarding the R-Square and standard e
or of regression obtained in your regression output. (1 mark each, 2 marks total)
(i) Interpret the reported R Square.
(ii) Interpret the reported standard e
or of regression.
Question 8: Following the steps provided below, test a joint hypothesis that neither DIST nor STRatio affects the house price. (9 marks total)
(i) State the null and alternative hypothesis. (1 mark)
(ii) State a regression model that needs to be estimated to test the hypothesis stated in (i). (2 marks)
(iii) Estimate the regression model proposed in step (ii) and calculate an appropriate statistics to test the hypothesis stated in step (i). State the relevant formula used for calculation of statistics, specify the inputs and present the final result. (4 marks)
(iv) Obtain the relevant critical value at 1% significance level and complete the test (that is, to determine whether to reject or not to reject the hypothesis. (2 marks)
PART B (12 MARKS)
Suppose that a researcher is interested in the determinants of income for Australian workers and has estimated the following multiple linear regression model,
where = monthly income earned by individual i (measured in dollar per month),
= gender of individual i (=1 if female, 0 = male),
= individual i’s age (measured in years), and
= individual i’s education (measured in years).
The model estimated with a sample of 166 Australian workers has yielded the following regression results:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.3724
R Square
0.1387
Adj R Square
0.1118
Std E
o
XXXXXXXXXX
Observations
166
ANOVA
df
SS
MS
F
Significance F
Regression
5
XXXXXXXXXX
XXXXXXXXXX
5.1527
0.0002
Residual
160
XXXXXXXXXX
4036971
Total
165
XXXXXXXXXX
Coefficients
Std E
o
t Stat
P-value
Lower 95%
Upper 95%
Intercept
XXXXXXXXXX
XXXXXXXXXX
-2.6651
0.0085
XXXXXXXXXX
XXXXXXXXXX
GEN
XXXXXXXXXX
XXXXXXXXXX
0.5277
0.5984
XXXXXXXXXX
XXXXXXXXXX
EDU
XXXXXXXXXX
89.7156
2.9600
0.0035
88.3756
XXXXXXXXXX
EDU*GEN
XXXXXXXXXX
XXXXXXXXXX
-0.3463
0.7296
XXXXXXXXXX
XXXXXXXXXX
AGE
XXXXXXXXXX
XXXXXXXXXX
3.2078
0.0016
XXXXXXXXXX
XXXXXXXXXX
AGE^2
-4.4013
1.4059
-3.1307
0.0021
-7.1777
-1.6248
Please answer the following three questions (Questions 9 through 11) in reference to the regression output provided on previous page.
Question 9: Interpret the estimated coefficients for GEN and EDU*GEN in Model (2).
XXXXXXXXXX2 marks each, 4 marks total)
(i) GEN
(ii) EDU*GEN
Question 10: In a two-dimensional diagram, sketch the relationship between INC and EDU as implied by the estimated regression equation and clearly indicate how this relationship differs between male and female workers of the same age. Clearly label the horizontal and vertical axes. (2 marks)
Question 11: Calculate the difference in the expected wage between a male and a female who are the same age and have 12 years of education. (2 marks)
Question 12: Suppose that a researcher has quarterly data on Gross Domestic Product (GDP) since 1980 to XXXXXXXXXXThe researcher has created a time variable t by setting t = 1 for the 1st quarter of 1980. Also, the researcher has created quarterly dummy variable,Q2 , and set its value such that Q 2,t = 1 if the observation t is from quarter 2 and Q 2,t = 0, otherwise. Similarly, the dummy variables are created for Q3 and Q4. Based on the data set created, the researcher has estimate a linear trend model with the time variable t and the three quarterly dummy variables to account for seasonal variation in the GDP. The relevant regression output is as follows:
SUMMARY OUTPUT
Regression Statistics
Multiple R
XXXXXXXXXX
R Square
XXXXXXXXXX
Adjusted R Square
XXXXXXXXXX
Standard E
o
XXXXXXXXXX
Observations
124
ANOVA
df
SS
MS
F
Significance F
Regression
4
5.91937E+11
1.4798E+11
XXXXXXXXXX
4.47E-97
Residual
119
XXXXXXXXXX
XXXXXXXXXX
Total
123
6.05521E+11
Coefficients
Standard E
o
t Stat
P-value
Lower 95%
Upper 95%
Intercept
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
2.209E-72
XXXXXXXXXX
XXXXXXXXXX
t
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
1.9481E-99
XXXXXXXXXX
XXXXXXXXXX
Q2
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
Q3
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
Q4
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
4.24008E-13
XXXXXXXXXX
XXXXXXXXXX
Please answer the following question in reference to the information and regression output provided on previous page.
Question 12: Interpret the estimated coefficients for time variable (t), Q3 and forecast GDP in 3rd Quarter in XXXXXXXXXXmarks)
i) Interpret the estimated coefficient for time variable, t. (1 mark)
ii) Interpret the estimated coefficient for quarter 2, Q2. (1 mark)
iii) Forecast GDP in 3rd Quarter in 2012. Present the relevant equation, input and final result. (2 marks)
Cover Page
Page 15 of 15
i
GEN
i
AGE
i
EDU
e
=+++´+++
2
012345
iiiiiiii
INCGENEDUEDUGENAGEAGE
i
INC
Sheet1
Hprice NOx DIST ROOMS STRatio CRIME
240 5.38 4.09 6.57 15.3 0.006
215.99 4.69 4.97 6.42 17.8 0.027
347 4.69 4.97 7.18 17.8 0.027
334 4.58 6.06 7 18.7 0.032
361.99 4.58 6.06 7.15 18.7 0.069
287.01 4.58 6.06 6.43 18.7 0.03
229 5.24 5.56 6.01 15.2 0.088
271 5.24 5.95 6.17 15.2 0.145
165 5.24 6.08 5.63 15.2 0.211
189 5.24 6.59 6 15.2 0.17
150 5.24 6.35 6.38 15.2 0.225
189 5.24 6.23 6.01 15.2 0.117
217 5.24 5.45 5.89 15.2 0.094
204 5.38 4.71 5.95 21 0.63
182 5.38 4.46 6.1 21 0.638
199 5.38 4.5 5.83 21 0.627
231 5.38 4.5 5.93 21 1.054
175 5.38 4.26 5.99 21 0.784
202 5.38 3.8 5.46 21 0.803
182 5.38 3.8 5.73 21 0.726
136 5.38 3.8 5.57 21 1.252
196 5.38 4.01 5.96 21 0.852
152 5.38 3.98 6.14 21 1.232
145 5.38 4.1 5.81 21 0.988
156 5.38 4.4 5.92 21 0.75
139 5.38 4.45 5.6 21 0.841
166 5.38 4.68 5.81 21 0.672
148 5.38 4.45 6.05 21 0.956
184 5.38 4.45 6.49 21 0.773
210 5.38 4.23 6.67 21 1.002
127 5.38 4.23 5.69 21 1.131
145 5.38 4.17 6.07 21 1.355
132 5.38 3.99 5.95 21 1.388
131 5.38 3.79 5.7 21 1.152
135 5.38 3.76 6.1 21 1.613
189 4.99 3.36 5.93 19.2 0.064
200 4.99 3.38 5.84 19.2 0.097
136.68 4.99 3.93 5.85 19.2 0.08
247.01 4.99 3