A Statistical Approach to Customer Lifetime Value - Part 2: Exponential Churn Rates
When you market you begin with the-4-Ps, when you model you begin with exponential
Disclaimer: The names and numbers of this case study have been changed for privacy concerns, however the churn data is real. If you want to follow along at home, click here for an accompanying spreadsheet. Enjoy!
After reading my previous blog post on CLV, you send off a data request to your data analytics guy asking him to pull the January 2014 subscribers and plot their churn over time. Dutifully, he responds with this data and graph:
Okay, so what?
Computing a control group
Let’s assume that each period (month), your customers pay you a $50 subscription fee. Utilizing a discount rate of 10% per year, we can find out the average discounted value of a customer thus far. We simply multiply $50 by the percent alive each period, discounted each period. Keep in mind, this is just the value of the average customer for the 28 months of data, NOT the customer lifetime value. Right now we’re just calibrating.
Completing the above yields us $212 of value over the first 28 months. This $212 will be considered the control group and will help us compare our models.* Now what?
Exponential churn - a first attempt
Our first crack at this problem is the exponential churn method. I’m calling this the exponential churn method because I’ve found no better name. The name exponential comes from the equation to calculate the percent of surviving customers from a cohort at any period:
There are three assumptions behind exponential churn:
- Every period, a customer flips a weighted coin representing the churn rate. If it lands on heads, they churn.
- Coin flips are independent of each other.
- Everyone has the same coin.
The key variable here is the churn rate: what percent of customers leave each period. If your churn rate is 10% and you start with 100 customers, that means in the first period 10 will leave (10% of 100), then of the 90 remaining, 9 will leave next (10% of 90), and so on. Every period, they keep flipping a coin with a 90% chance of tails, a 10% chance of heads.
To solve for this optimal churn rate, I used Solver to minimize the squared difference between actual churn and our model, only utilizing the first 12 months of data.
A quick note on in-sample fit vs. out-of-sample fit
It’s important here to distinguish between in-sample fit and out-of-sample fit. Essentially, your in-sample fit refers to how well your model fits the given data, while out-of-sample fit refers to how well your model fits new data. If I fit the exponential model on all 28 data points, the exponential model would do much better. However, the real challenge for you data scientists out there is not to describe the data, but forecast the data. A good model doesn’t just fit data, it predicts. If you only had the first year of data because you launched a new product, you’d want to be confident that your model will work even with a few data points. Alright, let’s return to the show.
With Solver’s help, we calculated a churn rate of 4.88% and the following graph:
Yeesh. That does not look good. We really have two problems going for us here:
- Initial bump - It looks like there is a bump from period 6 to 7 where a significant portion of our subscribers drop off. This insight will be investigated further by our marketing team to determine what’s going on after 6 months. In the meantime, we can incorporate that into our model.
- Non-linear data - After the bump, the data is clearly very non-linear. By the tail-end of this graph, the differences between predicted and actual churn are striking and only moving further away. We can’t fix this now, but stay tuned for part 3 where we will fix it.
That being said, the value of our predicted customer over the first 28 months is $203, only a 4% error from the actual $212. With more data, the gap between predicted and actual would widen a bit more, reminding us of the importance of out-of-sample fit. But really, would you go into your boss’s room with a graph that looks like that?
Initial bump at 6 - a refinement
After talking with our product team, it turns out that after 6 months of using our product, our company actually boots off customers that are losing us money. Let’s imagine that these customers call into our call centers too often and therefore are netting us a loss, so we remove them. (This is a real phenomenon, by the way, check it out). Given this revelation, we add a “bump” variable that allows us to kick off a certain percentage of customers.
Between the sixth and seventh period, we add in an artificial “bump” variable, that allows us a one-time booting of a certain percentage of the cohort. Utilizing Solver once again, we now get a churn rate of 3.96% and a bump rate of 8.36%. This means that around 4% of our customers churn each period, and an additional 8% churn one-time at period six, meaning a full 12% drop after 6 months. How about this graph?
We fixed the problem with the initial bump, but we still have a bit of separation toward the end that looks off. This model yields $209 of value over the 28 months, only 1% off from the actual value of $212! The model does well, but still doesn’t look good for the long-term. Join me next time where we’ll explore the discrete Weibull model!
Notes
- I used the actual value of the customer here for a control group and measured percent error as a proxy for fit, but in academics, there are better statistical criteria like p-values, R-squared, log-likelihood and BIC to name a few. Additionally, you must consider in-sample fit vs. out-of-sample fit, parsimony and story, which I hope to discuss in greater detail later.
- Title Inspiration