Customer Analytics —Individual Research Project— 1. Background Information You are working for a...

Question

Customer Analytics —Individual Research Project— 1. Background Information You are working for a telecommunication provider. The company wants to improve their customer lifetime value (CLV)...

1 answer below »

Customer Analytics
—Individual Research Project—
1. Background Information
You are working for a telecommunication provider. The company wants to improve their customer
lifetime value (CLV) calculations for newly acquired customers. The key question that the firms
marketing managers have is how they can account for the fact that it is very difficult to know a
customer’s relationship duration in advance. Yet customer relationship duration is one of the key
information to be considered in CLV calculations.
When they describe this problem to you, you suggest that you might be able to help. In particular,
you believe that you can use a survival model in order to predict customer survival probabilities and
then use those probabilities to improve the CLV calculations. You agree with the marketing
managers that you will check the data available and perform the necessary analyses for them.
2. Data, Sample and Variables
You are provided a dataset that contains information about 3,333 randomly sampled customer
elationships. The dataset is called “telecom_churn” and is a csv file. The dataset contains the
following variables:
Churn: Information whether or not a customer has churned.
AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer
had an active account at the firm. The variable is captured in weeks.
DataPlan: Whether or not a customer has a data plan
DataUsage: Gigabytes of average monthly data usage
CustServCalls: How often a customer has called the service hotline
DayMins: Average daytime minutes (calling time) per month
DayCalls: Average number of daytime calls
MonthlyCharge: Average monthly bill
OverageFee: Largest overage fee in the last year
RoamMins: Average number of roaming minutes
3. Your Tasks
Please use R to perform the following tasks. You can earn a total of 100 points.
1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average
customer. Call this mod0. Please provide the output and visualize the survival curve.

2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it
mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or
mod1 for predicting customer survival probabilities. Why?

3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to
inform your customer acquisition efforts (e.g., which customers should be preferably acquired).
Do you see a chance to improve model performance given the data at hand? Please explain
your answer.

4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you
see any reason for concern?

5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data
plan and customers with a data plan. The annual interest rate is 5% (note that you have to
translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks,
please calculate the CLV and the probability co
ected CLV for customers without a data plan
and customers with a data plan. Please present the co
ect results in a table (i.e., CLV and
probability co
ected CLV for both customer prototypes). Should the firm focus on either type
of the two customers in their future customer acquisition efforts?

Here is a little helper on how to achieve that: First, you have to derive monthly average cash
flows for the two customer prototypes separately from the variable MonthlyCharge (use
DataPlan as a grouping variable).

Second you have to calculate the average weekly cash flows from this data (average monthly
cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500
weeks.

Third, you have to derive the predicted survival probabilities for each customer. We have not
done the coding for this in the tutorial but it can be achieved in a few steps. You just have to
make sure that you use your own variable names in the code below.

Suggested code for this step
# First install the rms package, which is required to derive predictions for different points in
# time.
install.packages("rms")
li
ary ("rms")

# Then you have to rerun mod1 using the psm function that is equivalent to the survreg
# function used in the tutorial
mod1_psm <- psm(Surv(AccountWeeks, Churn) ~ DataPlan, data = Telco1, dist="weibull")
mod1_psm # This model is the same as the previous model mod1

# We produce a sequence which will define the points in time at which we want predicted
survival probabilities from our model.
weeks <- seq(1,500, by = +1)

# We define the levels of the DataPlan variable for which we want probabilities
n.dat <- expand.grid(DataPlan = levels(DataPlan))

# We ask the model for predictions for 500 weeks ahead.
1<-survest(mod1_psm, newdata = data.frame(n.dat), time=weeks)

# We rea
ange the data such that we can easily use it for the cash flow predictions.
2<-cbind(n.dat, b1)
3<-melt(b2, id.vars=c("DataPlan"), variable.name="time", value.name="surv prob")
3
Fourth, you now have all necessary information to calculate the CLV and probability co
ected CLV
for both customer prototypes. You can do this either in R or in Excel.
6) (20 points) Please present a simple visualization that demonstrates your key insight from the
probability co
ected CLV to managers. (It is easiest to use PowerPoint to provide an
appropriate chart.)

Customer Analytics
—Individual Research Project—
1. Background Information
You are working for a telecommunication provider. The company wants to improve their customer
lifetime value (CLV) calculations for newly acquired customers. The key question that the firms
marketing managers have is how they can account for the fact that it is very difficult to know a
customer’s relationship duration in advance. Yet customer relationship duration is one of the key
information to be considered in CLV calculations.
When they describe this problem to you, you suggest that you might be able to help. In particular,
you believe that you can use a survival model in order to predict customer survival probabilities and
then use those probabilities to improve the CLV calculations. You agree with the marketing
managers that you will check the data available and perform the necessary analyses for them.
2. Data, Sample and Variables
You are provided a dataset that contains information about 3,333 randomly sampled customer
elationships. The dataset is called “telecom_churn” and is a csv file. The dataset contains the
following variables:
Churn: Information whether or not a customer has churned.
AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer
had an active account at the firm. The variable is captured in weeks.
DataPlan: Whether or not a customer has a data plan
DataUsage: Gigabytes of average monthly data usage
CustServCalls: How often a customer has called the service hotline
DayMins: Average daytime minutes (calling time) per month
DayCalls: Average number of daytime calls
MonthlyCharge: Average monthly bill
OverageFee: Largest overage fee in the last year
RoamMins: Average number of roaming minutes
3. Your Tasks
Please use R to perform the following tasks. You can earn a total of 100 points.
1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average
customer. Call this mod0. Please provide the output and visualize the survival curve.

2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it
mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or
mod1 for predicting customer survival probabilities. Why?

3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to
inform your customer acquisition efforts (e.g., which customers should be preferably acquired).
Do you see a chance to improve model performance given the data at hand? Please explain
your answer.

4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you
see any reason for concern?

5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data
plan and customers with a data plan. The annual interest rate is 5% (note that you have to
translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks,
please calculate the CLV and the probability co
ected CLV for customers without a data plan
and customers with a data plan. Please present the co
ect results in a table (i.e., CLV and
probability co
ected CLV for both customer prototypes). Should the firm focus on either type
of the two customers in their future customer acquisition efforts?

Here is a little helper on how to achieve that: First, you have to derive monthly average cash
flows for the two customer prototypes separately from the variable MonthlyCharge (use
DataPlan as a grouping variable).

Second you have to calculate the average weekly cash flows from this data (average monthly
cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500
weeks.

Third, you have to derive the predicted survival probabilities for each customer. We have not
done the coding for this in the tutorial but it can be achieved in a few steps. You just have to
make sure that you use your own variable names in the code below.

Suggested code for this step
# First install the rms package, which is required to derive predictions for different points in
# time.
install.packages("rms")
li
ary ("rms")

individual-research-project-1-j3piaztp.pdf telecomchurn29oct2020-r-05hrsad1.csv telecomchurn29oct2020-r-bu3jnslq-1mec3ny3.csv individual-research-project-1-pj5putht-xgk3p03p.pdf

Answered Same Day Nov 12, 2021

Solution

Abr Writing answered on Nov 15 2021

156 Votes

telecom_churn.docx
Customer Analytics
Individual Research Project
Loading the data into R workspace
telecom_churn <- read.csv("telecom_churn.csv")
telecom_churn$DataPlan <- as.factor(
telecom_churn$DataPlan
)
Task 1
surv <- Surv(time = telecom_churn$AccountWeeks,
event = telecom_churn$Churn)
mod0 <- survfit(surv ~ 0, data = telecom_churn)
ggsurvplot(mod0, data = telecom_churn, pval = T)
Warning in .pvalue(fit, data = data, method = method, pval = pval, pval.coord = pval.coord, : There are no survival curves to be compared.
This is a null model.
Task 2
mod1 <- survfit(surv ~ DataPlan, data = telecom_churn)
ggsurvplot(mod1, data = telecom_churn, pval = T)
The log-rank p-value of less than 0.0001 indicates a significant result if you consider p < 0.05 to indicate statistical significance. In this study, Customer which has a data plan were significantly superior, and customers using Data Plan B are doing better throughout the time of follow-up.
The base survival model is similar to the Survival model for the customers with no Data Plan. Also, we don’t have anything to compare in mod0, hence, selecting mod1 for predicting customer survival probabilities.
Task 3
as.data.frame(
t(as.data.frame(lapply(telecom_churn,
function(x) {return(
length(unique(x))
)})))
)
V1
Churn 2
AccountWeeks 212
DataPlan 2
DataUsage 174
CustServCalls 10
DayMins 1667
DayCalls 119
MonthlyCharge 627
OverageFee 1024
RoamMins 162
Th table above shows the number of unique values in each of the variables in the telecom_churn dataset. The model for predicting the survival curve to inform your customer acquisition efforts could have been improved by employing more variables from the dataset. However, looking at the sheer number of levels of each variables, it is much better to use mod1 for predicting the customer survival probabilities to avoid any possibility of over-fitting.
Task 4
ggsurvplot(mod1, data = telecom_churn, pval = T)
The curves diverge early and the log-rank test significant. One might want to argue that a customer acquisition with an increased sample size could validate these results, that is, that customers with data plan have a significantly higher probability of conversion compared to customers without data plan.
Task 5
clv <- telecom_churn %>%
group_by(DataPlan) %>%
summarise(
MeanMonthlyCharge = mean(MonthlyCharge)
) %>%
mutate(
CLV = 500*12/51*MeanMonthlyCharge
)
clv
# A ti
le: 2 x 3
DataPlan MeanMonthlyCharge CLV
...

SOLUTION.PDF

Customer Analytics —Individual Research Project— 1. Background Information You are working for a telecommunication provider. The company wants to improve their customer lifetime value (CLV)...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment