Basic Econometrics
Research Report Group Assignment
This is a group assignment where you can work alone or with up to three other students (a maximum group size of four). All group members will receive the same marks for the assignment. You must submit an electronic copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted. Show your calculations (if any) as well as answering the questions in clear full sentences. You should write no more than 1000 words in total for this assignment. The number of words and calculations given in parentheses after each question are a guide.
PART 1
This first part of the assignment uses data from the BUPA health insurance call centre. Each observation includes data from one call to the call centre. The variables describe several characteristics of the call (eg the length of the call, the amount of silence in the call), characteristics of the customer (eg state of residence, family type, number of adults and children), and measures of performance (eg net promoter score, sentiment score of the customer). In this assignment we are interested in predicting the net promoter score and the length of the call.
Please use the information file CC_DEFINITIONS_NEW.XLSX to understand the variables.
1. This table shows descriptive statistics for the variables net_promoter_score, total_silence, total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted. Comment on what we learn about these variables from the descriptives. Look at the scatter plot of net_promoter_score against agent_crosstalk_weighted and describe the relationship between these two variables.
(4.5 marks) (100 words, 1 table, 1 graph)
Variable
Obs
Mean
Std. Dev.
Min
Max
net_promoter_score
1,945
8.567
1.970
0
10
total_silence
1,945
43.898
72.246
0
518.5
total_silence_weighted
1,945
0.099
0.125
0
0.665
agent_to_cust_index
1,945
2.061
1.504
0.142
14.674
agent_crosstalk_weighted
1,944
0.020
0.014
0
0.092
2. Consider the multiple linear regression with net_promoter_score as the dependent variable and total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted as the explanatory (independent) variables. Predict the change in net_promoter_score associated with a 0.1 increase in total_silence_weighted and a 0.01 increase in agent_crosstalk_weighted. Assuming this is the co
ect model specification, are we sure that total_silence_weighted has a negative effect? [Hint: consider the t-statistic and p-value]
(4 marks) (50 words, 2 calculations)
3. Consider the following table where dummy variables have been added to the regression to control for all of the potential effects of State and Package. Why have the variables state1 and package3 been omitted? Carefully interpret the estimated coefficient on the package1 dummy variable you have included. Why is this NOT a very important result?
[Hints: Use the spreadsheet describing the variables CC_DEFINITIONS_New.xlsx to understand the definitions of the dummy variables. The mean of package1 is 0.0165]
(4.5 marks) (50 words)
4. Consider the following results including a level and a squared term for the variable sentiment_score_cust in the model along with the existing explanatory variables. The squared term is called sentiment_score_custsq and is the square of sentiment_score_cust. What is the name of this type of model specification? Calculate and interpret the marginal effect of a 1 point change in “sentiment_score_cust” when sentiment_score_cust = 1 and when sentiment_score_cust=4.
(4.5 marks) (50 words, 2 calculations)
5. Explain the conditional mean independence assumption and assess its relevance with respect to the explanatory variable “sentiment_score_cust”.
[Hint: Think about factors that may be included in the e
or term of the regression: the customer’s experience with the company (positive or negative), the general attitude of the customer towards call centre conversations (positive or negative) and whether these may be co
elated with sentiment_score_cust]
(3 marks) (100 words)
6. Write an executive summary of the findings in questions 2 to 5 on what variables are likely and are not likely to be important drivers of net promoter score.
(1.5 marks, 100 words)
PART 2
“The rise in energy consumption of rapidly growing developing countries, especially China and India, has accounted for the vast majority of the global increase in energy use in recent years. Non-OECD countries cu
ently account for approximately 60% of global energy demand, which is predicted to rise to 70% by 2040 (International Energy Agency, XXXXXXXXXXThis increasing energy use exace
ates environmental problems including global climate change due to greenhouse gas emissions and local environmental problems such as the recent episodes of extreme air pollution in Beijing and other Chinese cities. Besides its environmental impacts, increasing energy use also raises questions of national energy supply security. As the share of world energy use consumed in developing countries increases, it is increasingly important to understand how energy use evolves across the full income continuum from less developed to highly developed countries (van Ruijven et al., 2009).” Csereklyei and Stern XXXXXXXXXXpage 633.
In this part of the home assignment we will be exploring the drivers of total and sectoral energy use across several developed and developing countries.
7. Countries have a keen interest in exploring the drivers of their sectoral energy consumption, including TRANSPORTATION energy use. These models will examine the log of final energy use by TRANSPORTATION “ln_tranpc” across 128 countries. All variables with names beginning “ln” are measured in natural logarithms. The variable oecd is a dummy variable equal to 1 for countries in the OECD and equal to zero otherwise. The variables are described below:
Lntran_pc = log of transportation energy consumption per capita (ktoe)
Lnypcpenn =log of GDP per capita (USD)
Lnypcpenn2 =log of GDP per capita SQUARED
Ln_gasprice = log of pump price for gasoline (USD/liter)
Ln_temperature = log of the average annual temperature (in C)
Ln_annualprecip= log of annual precipitation (mm)
Ln_land = log of the land area of a country
OECD = a dummy (indicator) that takes on the value of 1 if the country is OECD member, zero otherwise.
(1) The first model has a log per capita GDP term (lnypcpenn) [Model 1],
(a) Interpret the coefficients including dummies, elasticities or semi-elasticities (2 marks)
(b) Interpret the statistical significance of these coefficients (2 marks)
(Subtotal: 4 marks)
(2) The second model has a quadratic specification of the log of per capita GDP (lnypcpenn for the level term lnypcpenn2 for the squared term) [Model 2].
(a) Interpret the coefficient estimates for the quadratic specification of the log of per capita GDP (lnypcpenn) at the value lnypcpenn=9. (2 marks)
(b) What are the major differences in the other coefficient estimates compared to model 1? Please comment on the size and statistical significance of the coefficient estimates. (1 mark)
(c) Which model do you think is more appropriate (number 1 or 2)? Please justify your answer. (1 mark)
(Subtotal: 4 marks)
(3) Describe the “Gauss Markov” assumptions and whether these assumptions are likely to be met in these models. (2 marks)
(4) Interpret the results of THREE of your explanatory variables including GDP per capita, which you consider to be the key drivers of per capita transportation energy consumption. (3 marks)
(Total: 13 marks) (550 words, 3 tables, 4 calculations)
There will be up to 2 additional marks awarded for the presentation of your answers (clear expression of answers in full sentences).
Ru
ic for marking
Criteria
Pts
PART 1
1. Descriptive statistics
A) Present descriptive statistics table, B) comment on descriptives, C) present and comment on graph.
4.5 pts (1.5 marks each)
2. Multiple linear regression
A) Estimate regression model, B) present table, C) two predictions, D) comment on total_silence_weighted effect
4.0 pts (1 marks each)
3. Dummy variables
A) Include dummy variables co
ectly, B) Comment on package1 coefficient C) Why not an important result
4.5 pts (1.5 marks each)
4. Quadratic Specification
A) Include quadratic specification co
ectly and present results in table. B) Calculate marginal effect when sentiment_score_cust=1 C) Calculate marginal effect when sentiment_score_cust=4
4.5 pts (1.5 marks each)
5. Conditional mean independence
A) Explain conditional mean independence assumption. B) Discuss with reference to the variable "sentiment_score_cust"
3.0 pts (1.5 marks each)
6 .Executive Summary
1.5 pts
PART 2
7. a Model design
A) Linear model with explanations (4 pts)
B) Quadratic model with explanation (4 pts)
8 Pts
7. b Model design
A) Discuss Gauss_Markov assumptions 1-3
B) Discuss Gauss_Markov assumptions 4-5
C) Prediction 1
D) Prediction 2
E) Prediction 3
5 Pts
XXXXXXXXXX_cons XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXXoecd XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXXlnland XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
ln_annualprecip XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
ln_temperature XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
ln_gasprice XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
lnypcpenn XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
lnypcpenn XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX