Customer Lifetime Value Analytics: Case Study
Introduction 😉
Customer lifetime value (CLTV) is the estimated total amount a customer will spend on a business throughout their relationship with that business. It takes into account the revenue generated by the customer as well as the costs associated with acquiring and serving that customer. By analyzing the relationship between customer acquisition costs and revenue generated, we can determine which channels are the most cost-effective for acquiring and retaining high-value customers.
The given data (available from the link below) includes information about the customer’s channel, cost of acquisition, conversion rate, and revenue generated.
Goal 🥅
The goal is to analyze the CLTV of customers across different channels and identify the most profitable channels for the business.
Load Packages ♎
Load Dataset 📅
Code
Rows: 800
Columns: 5
$ customer_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,~
$ channel <fct> referral, paid advertising, email marketing, social me~
$ cost <dbl> 8.320327, 30.450327, 5.246263, 9.546326, 8.320327, 30.~
$ conversion_rate <dbl> 0.12314498, 0.01634149, 0.04382223, 0.16759225, 0.1231~
$ revenue <dbl> 4199, 3410, 3164, 1520, 2419, 3856, 1172, 700, 2137, 9~
Exploratory Data Analysis 🏁
Code
summary(customer_tbl[,-1])
channel cost conversion_rate revenue
email marketing :214 Min. : 5.246 Min. :0.01634 Min. : 500
paid advertising:194 1st Qu.: 5.246 1st Qu.:0.04382 1st Qu.:1694
referral :207 Median : 8.320 Median :0.04382 Median :2764
social media :185 Mean :13.148 Mean :0.08630 Mean :2769
3rd Qu.: 9.546 3rd Qu.:0.12314 3rd Qu.:3824
Max. :30.450 Max. :0.16759 Max. :4998
Let’s start by visualizing the distribution of
acquisition cost
andrevenue generated
by the customer using histograms:
Code
cost <- customer_tbl %>%
ggplot(aes(cost)) +
geom_histogram(fill = "steelblue", color = "white", bins = 30) +
labs(title = "Distribution of acquisition cost",
x = "Acquisition Cost",
y = "Frequency")
rev <- customer_tbl %>%
ggplot(aes(revenue)) +
geom_histogram(fill = "tomato", color = "white", bins = 30) +
labs(title = "Distribution of revenue generated",
x = "Revenue",
y = "Frequency") +
scale_x_continuous(labels = scales::comma)
cost/rev
Research Questions ❓
Q1
- What’s the cost of acquisition across different channels?
Now, let’s examine the cost of acquisition across different channels and identify the most and least profitable channels:
Code
customer_tbl %>%
group_by(channel) %>%
summarise(Ave_cost = mean(cost, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(channel, Ave_cost), y = Ave_cost)) +
geom_bar(stat ="identity", width = 0.6) +
labs(x = "Channel",
y = "Average cost of acquisition",
title = "Average cost of acquisition across channels") +
geom_label_repel(aes(label = round(Ave_cost)))
Insight ℹ️
- Paid advertisement is the most expensive channel whereas email marketing is the least expensive channel.
Q2
- Which channels are most and least effective at converting customers?
Now, let’s see which channels are most and least effective at converting customers:
Code
customer_tbl %>%
group_by(channel) %>%
summarise(Ave_rate = mean(conversion_rate, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(channel, Ave_rate), y = Ave_rate)) +
geom_bar(stat ="identity", width = 0.6, fill = "steelblue", color = "white") +
labs(x = "Channel",
y = "Average conversion rate",
title = "Average conversion rate across channels") +
geom_label_repel(aes(label = round(Ave_rate,2)))
Insight ℹ️
- Social media is the most effective channel for converting customers, whereas paid advertising is the least effective.
Q3
- What are the most and least profitable channels in terms of generating revenue?
Now, let’s calculate the total revenue by channel and have a look at the most and least profitable channels in terms of generating revenue:
Code
customer_tbl %>%
group_by(channel) %>%
summarise(sum_rev = sum(revenue, na.rm = TRUE)) %>%
mutate(Prop = sum_rev/sum(sum_rev),
channel = factor(channel, c("email marketing", "referral",
"paid advertising", "social media") )) %>%
arrange(Prop, channel) %>%
ggplot(aes(x = 2, y = Prop, fill = channel)) +
geom_bar(stat ="identity", width = 1, color = "white") +
xlim(0.3, 2.5) +
coord_polar(theta = "y", start = 0) +
theme_void() +
scale_y_continuous(labels = scales::percent) +
labs(x = "",
y = "",
title = "Revenue contribution by channels",
fill = "") +
geom_text_repel(aes(label = paste0(round(Prop*100,1),"%")), size = 3, position = position_stack(vjust = 0.5)) +
scale_fill_brewer(palette = "Blues", direction = -1) +
theme(legend.position = "right")
Insight ℹ️
So, email marketing is the most profitable channel in terms of generating revenue.
Q4
- What’s the return on investment for each channel?
Let’s examine the return on investment (ROI) for each channel
Code
customer_tbl %>%
mutate(roi = revenue/cost) %>%
group_by(channel) %>%
summarise(Ave_ROI = mean(roi, na.rm = TRUE)) %>%
ggplot(aes(x= reorder(channel, Ave_ROI), y = Ave_ROI)) +
geom_bar(stat = "identity", fill = "steelblue", color = "white",
width = 0.6) +
labs(x = "Channel",
y = "Average Return on Investment",
title = "Average Return on Investment (ROI) by Channel")
Insight ℹ️
The ROI from email marketing is way higher than all other channels, whereas the ROI from paid advertising is the lowest.
Q5
- What is the distribution of customer lifetime value from each channel?
Let’s calculate the customer lifetime value from each channel.
Based on the data we have, we can use the formula below to calculate CLTV
\[ CLTV=(revenue − Cost)∗ \frac{conversionRate}{cost} \]
Code
customer_tbl %>%
mutate(CLTV = (revenue - cost)* (conversion_rate/cost)) %>%
group_by(channel) %>%
summarise(cltv = mean(CLTV, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(channel, cltv), y = cltv, fill = channel)) +
geom_bar(stat = "identity", width = 0.6, show.legend = FALSE) +
labs(x = "Channel",
y = "CLTV",
title = "Customer Lifetime Value by Channel")
Insight ℹ️
So, the customer lifetime value from social media and the referral channels are the most significant.
Let’s compare the CLTV distributions of the CLTV across channels
Insight ℹ️
The boxplot above suggested that there’s a difference in the distribution of the customer lifetime value from different channels.
Hypotheses Testing 🧪
The following research hypotheses would be considered in the subsection below;
- is there is a significant difference in the revenue generated by channels?
- does the CLTV differ significantly across the channels?
Hypothesis 1
Null hypothesis: There’s no significant difference in the revenue generated by channels
Alternative Hypothesis: There’s a significant different in the revenue generated by channels
Df Sum Sq Mean Sq F value Pr(>F)
channel 3 3476233 1158744 0.73 0.534
Residuals 796 1264097594 1588062
Insight ℹ️
- The test result above suggested that there is no significant difference in the revenue across channel at 5% level of significant. As a result, we cannot reject the null hypothesis.
Hypothesis 2
Null Hypothesis: There’s no significant difference in the CLTV across channels
Alternative Hypothesis: There’s a significant difference in the CLTV across channels
Code
Df Sum Sq Mean Sq F value Pr(>F)
channel 3 237204 79068 339.6 <0.0000000000000002 ***
Residuals 796 185312 233
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Insight ℹ️
- The test result above suggest that there’s a significant difference in the customer lifetime value across channels. The test is significant at 5% level of significance. As a result, the null hypothesis cannot be accepted.
As a follow-up question, it would be great to know where the difference lies. Below is a post-hoc test conducted to know where the difference lies.
Code
out <- duncan.test(model2, "channel", main = "CLTV across channels")
plot(out, variation = "IQR", main = "CLTV across channels")
Code
print(out$groups)
CLTV groups
social media 46.584325 a
referral 40.599817 b
email marketing 23.559595 c
paid advertising 1.500682 d
- The post hoc test shows that there’s a significant different across all the channels at 5% level of significance.
Summary 👍
Customer lifetime value analysis is used to estimate the total value of customers to the business over the lifetime of their relationship. It helps companies determine how much to invest in customer acquisition and retention, as well as identify the most valuable customers to prioritize for retention efforts.
Take Away 🕶️
It was observed that email marketing
has the least cost of acquisition with a better conversion rate than paid advertisement
which cost the most among all the channels. Social media
has the best conversion rate with email marketing
contributing the most to the revenue generated and also has a better return on investment (ROI) than others. Social media
has a better CLTV among all the channels and this is found to be statistically significant.