Customer Analytics
—Individual Research Project—
1. Background Information
You are working for a telecommunication provider. The company wants to improve their customer
lifetime value (CLV) calculations for newly acquired customers. The key question that the firms
marketing managers have is how they can account for the fact that it is very difficult to know a
customer’s relationship duration in advance. Yet customer relationship duration is one of the key
information to be considered in CLV calculations.
When they describe this problem to you, you suggest that you might be able to help. In particular,
you believe that you can use a survival model in order to predict customer survival probabilities and
then use those probabilities to improve the CLV calculations. You agree with the marketing
managers that you will check the data available and perform the necessary analyses for them.
2. Data, Sample and Variables
You are provided a dataset that contains information about 3,333 randomly sampled customer
elationships. The dataset is called “telecom_churn” and is a csv file. The dataset contains the
following variables:
Churn: Information whether or not a customer has churned.
AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer
had an active account at the firm. The variable is captured in weeks.
DataPlan: Whether or not a customer has a data plan
DataUsage: Gigabytes of average monthly data usage
CustServCalls: How often a customer has called the service hotline
DayMins: Average daytime minutes (calling time) per month
DayCalls: Average number of daytime calls
MonthlyCharge: Average monthly bill
OverageFee: Largest overage fee in the last year
RoamMins: Average number of roaming minutes
3. Your Tasks
Please use R to perform the following tasks. You can earn a total of 100 points.
1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average
customer. Call this mod0. Please provide the output and visualize the survival curve.
2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it
mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or
mod1 for predicting customer survival probabilities. Why?
3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to
inform your customer acquisition efforts (e.g., which customers should be preferably acquired).
Do you see a chance to improve model performance given the data at hand? Please explain
your answer.
4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you
see any reason for concern?
5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data
plan and customers with a data plan. The annual interest rate is 5% (note that you have to
translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks,
please calculate the CLV and the probability co
ected CLV for customers without a data plan
and customers with a data plan. Please present the co
ect results in a table (i.e., CLV and
probability co
ected CLV for both customer prototypes). Should the firm focus on either type
of the two customers in their future customer acquisition efforts?
Here is a little helper on how to achieve that: First, you have to derive monthly average cash
flows for the two customer prototypes separately from the variable MonthlyCharge (use
DataPlan as a grouping variable).
Second you have to calculate the average weekly cash flows from this data (average monthly
cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500
weeks.
Third, you have to derive the predicted survival probabilities for each customer. We have not
done the coding for this in the tutorial but it can be achieved in a few steps. You just have to
make sure that you use your own variable names in the code below.
Suggested code for this step
# First install the rms package, which is required to derive predictions for different points in
# time.
install.packages("rms")
li
ary ("rms")
# Then you have to rerun mod1 using the psm function that is equivalent to the survreg
# function used in the tutorial
mod1_psm <- psm(Surv(AccountWeeks, Churn) ~ DataPlan, data = Telco1, dist="weibull")
mod1_psm # This model is the same as the previous model mod1
# We produce a sequence which will define the points in time at which we want predicted
survival probabilities from our model.
weeks <- seq(1,500, by = +1)
# We define the levels of the DataPlan variable for which we want probabilities
n.dat <- expand.grid(DataPlan = levels(DataPlan))
# We ask the model for predictions for 500 weeks ahead.
1<-survest(mod1_psm, newdata = data.frame(n.dat), time=weeks)
# We rea
ange the data such that we can easily use it for the cash flow predictions.
2<-cbind(n.dat, b1)
3<-melt(b2, id.vars=c("DataPlan"), variable.name="time", value.name="surv prob")
3
Fourth, you now have all necessary information to calculate the CLV and probability co
ected CLV
for both customer prototypes. You can do this either in R or in Excel.
6) (20 points) Please present a simple visualization that demonstrates your key insight from the
probability co
ected CLV to managers. (It is easiest to use PowerPoint to provide an
appropriate chart.)
Customer Analytics
—Individual Research Project—
1. Background Information
You are working for a telecommunication provider. The company wants to improve their customer
lifetime value (CLV) calculations for newly acquired customers. The key question that the firms
marketing managers have is how they can account for the fact that it is very difficult to know a
customer’s relationship duration in advance. Yet customer relationship duration is one of the key
information to be considered in CLV calculations.
When they describe this problem to you, you suggest that you might be able to help. In particular,
you believe that you can use a survival model in order to predict customer survival probabilities and
then use those probabilities to improve the CLV calculations. You agree with the marketing
managers that you will check the data available and perform the necessary analyses for them.
2. Data, Sample and Variables
You are provided a dataset that contains information about 3,333 randomly sampled customer
elationships. The dataset is called “telecom_churn” and is a csv file. The dataset contains the
following variables:
Churn: Information whether or not a customer has churned.
AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer
had an active account at the firm. The variable is captured in weeks.
DataPlan: Whether or not a customer has a data plan
DataUsage: Gigabytes of average monthly data usage
CustServCalls: How often a customer has called the service hotline
DayMins: Average daytime minutes (calling time) per month
DayCalls: Average number of daytime calls
MonthlyCharge: Average monthly bill
OverageFee: Largest overage fee in the last year
RoamMins: Average number of roaming minutes
3. Your Tasks
Please use R to perform the following tasks. You can earn a total of 100 points.
1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average
customer. Call this mod0. Please provide the output and visualize the survival curve.
2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it
mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or
mod1 for predicting customer survival probabilities. Why?
3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to
inform your customer acquisition efforts (e.g., which customers should be preferably acquired).
Do you see a chance to improve model performance given the data at hand? Please explain
your answer.
4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you
see any reason for concern?
5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data
plan and customers with a data plan. The annual interest rate is 5% (note that you have to
translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks,
please calculate the CLV and the probability co
ected CLV for customers without a data plan
and customers with a data plan. Please present the co
ect results in a table (i.e., CLV and
probability co
ected CLV for both customer prototypes). Should the firm focus on either type
of the two customers in their future customer acquisition efforts?
Here is a little helper on how to achieve that: First, you have to derive monthly average cash
flows for the two customer prototypes separately from the variable MonthlyCharge (use
DataPlan as a grouping variable).
Second you have to calculate the average weekly cash flows from this data (average monthly
cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500
weeks.
Third, you have to derive the predicted survival probabilities for each customer. We have not
done the coding for this in the tutorial but it can be achieved in a few steps. You just have to
make sure that you use your own variable names in the code below.
Suggested code for this step
# First install the rms package, which is required to derive predictions for different points in
# time.
install.packages("rms")
li
ary ("rms")