A Statistical Approach to Customer Lifetime Value - Part 4: The Beta Geometric

Heterogeneous populations, what’s that all about?

4 minute read

Follow along here

In my last post, we discussed a more advanced approach to modeling customer churn by integrating this idea of duration dependence: the longer someone stays as a subscriber, the less likely they are to churn. You may recognize this phenomenon as “loyalty.” Whereas initially we assumed each customer had the same propensity to churn, we relaxed that assumption through the Weibull distribution, which allows us to increase or decrease churn propensity over time i.e. the longer a customer stays. But what if it wasn’t loyalty that increased, but in fact “bad” subscribers who had a high propensity to churn self-selected out of the group early on? Let’s review our initial assumptions one more time:

  1. Every period, a customer flips a weighted coin. If it lands on heads, they churn.
  2. The coins stay the same over time (i.e. coin flips are independent of each other).
  3. Everyone has the same coin.

The Complication

Assumption 3 is concerning. Everyone has the same coin? If you sign up 100 people for your magazine, it’s more likely that some people are really excited about your content and have low propensity to churn, some people mistakenly signed up and have high propensity, and some people don’t understand your content and have medium propensity as they figure this out. With duration dependence, we assumed that loyalty increases over time because churn rate decreases, but could it be that people with low loyalty just self-selected out of this group, making it seem like the overall loyalty increased?

Heterogeneity & Distributions

Instead of saying everyone has the same coin, let’s assume there’s a distribution of underlying or latent churn probabilities. Let’s turn this homogeneous group of coin-flipping customers into a beautiful diverse heterogeneous group of coin-flipping customers. Basically, we’re trying to graph out what the underlying distributions look like. When you think of distributions, you probably think of the normal distribution, pictured below:

While the normal is great, it models \(x\) from negative infinity to positive infinity. Our \(x\) is really our churn rate (\(\theta\)), which goes from 0 to 1, so normal ain’t gonna cut it. Let’s instead use the beta distribution. The beta distribution is great for modeling proportions or failure rates because it goes from [0,1] and because the Wikipedia page says so.

Disclaimer: I am not a statistician. Most of this series on CLV comes from a class I took at WashU, which is based off of this man’s hard research. Therefore, defer to Bruce Hardie’s slides for technical details. I would say my posts focus more on business application instead of academic significance. Choose your side…

The beta distribution has an \(\alpha\) and \(\beta\) parameter, where the mean value of the distribution is \(\frac{\alpha}{\alpha + \beta}\), or in this case our \(\theta\). The beta distribution is “mathematically convenient,” but it only shows us a snapshot in time, or in other words, our initial hypothesis of the distributions of churn rates across our population. As we know, however, we’re flipping coins each period, which means we need to update our distribution each new period.

The Beta - Geometric Model

We started with our basic exponential model (or the geometric model as it will now be known):

$$ survivors_t = (1-\theta)^t $$

And now all we do is replace the static \(\theta\) with our new beta distribution of \(\theta\)s, to mix in some heterogeneity. Out comes this model:

$$ survivors_t = \frac{B(\alpha, \beta + t)}{B(\alpha,\beta)}^t $$

(For the derivation, check out this link) And out comes this graph of churn:

And here are our stats:

Variables

Calculations Number
\(\alpha\) 0.35
\(\beta\) 2.61
Mean Churn 11.83%

Results

Model % Error Log-likelihood
Exponential 11.73% -89,167
Discrete Weibull 0.57% -84,763
Beta Geometric 0.34% -84,982

A few things to note:

  • Mean churn (\(\frac{\alpha}{\alpha + \beta})\), is around 12%, slightly lower than the dW model churn of 14%
  • Beta Geometric beats the discrete Weibull on % error, loses on log-likelihood
  • Both BG and dW fit the data fairly well.

So what?

The higher level questions these models pose is thus: does staying longer increase loyalty, or are “bad” subscribers self-selecting out of this initial cohort, making it seem like everyone’s staying longer? I honestly don’t know the answer to the question. Even though they rely on completely different assumptions, both models fit the data pretty perfectly, so who’s to say.

Next time, we’ll combine the models and see if we can get some better answers!

Updated: