A Crash Course in Survival Analysis: Customer Churn (Part I)

Joshua Cortez, a member of our Data Science Team, has put together a series of blogs on using survival analysis to predict customer churn. This is part one of the blog series.

Introduction

Customer churn is familiar to many companies offering subscription services. Simply put, customer churn is the event of a customer opting out of their subscription. They may do so for different reasons including: dissatisfaction with their plan because of consistently poor reception for a mobile phone network, the allure of better subscription packages/plans from competitors, or a variety of other reasons.

Source (1)

Businesses want to understand how and why their customers churn to improve their profits and deliver better services. In this blog post series, we’ll explore a branch of statistics called survival analysis to uncover insights that will be useful to understand and curb churn. We’ll use a churn dataset from a blog in the IBM Watson analytics community that describes a fictitious telco’s customers and how long they stayed before they churned.

Survival Analysis

Survival analysis has been traditionally used in medicine and in life sciences to analyse how long it takes before a person dies – hence the “survival” in survival analysis. The field however can be used to model other events that organisations care about, such as the failure of a machine, or customer churn. Okay cool. But what are the kinds of insights we can get from survival analysis?

We’ll talk about two main ideas in more detail in future blog posts: survival curves (in part II), and survival regression (in part III). We’ll discuss what they are, and what kinds of insights they bring to the table.

For today, an introduction to these concepts and an overview of our test dataset.

1. Survival Curves

Source (2)

An example survival curve – by charting the results we can visualise the changes over time and likelihood of churn (2).

What we can do with it:

i. Show how the likelihood of customer churn changes over time.
ii. Determine the optimal intervention point.

Questions it can answer:

i. How many years/months on average do our customers stay?

ii. How long do male customers stay compared to female customers?
iii. Is our understanding of our customer lifecycle accurate with reality?

Survival Regression allows us to apply a model to the survival analysis to
predict when an event is likely to occur.

What we can do with it:

i. Model the relationship between customer churn, time, and other customer characteristics.

Questions it can answer:

i. What’s the probability that this customer who is a female non-senior citizen with dependents will stay for 2 years?
ii. What are the significant factors that drive churn?

Examples of how survival analysis can be applied to other industries beyond telecommunications (2).
– Insurance – time to lapsing on policy
– Mortgages – time to mortgage redemption
– Mail Order Catalogue – time to next purchase
– Retail – time till food customer starts purchasing non-food
– Manufacturing – lifetime of a machine component
– Public Sector – time intervals to critical events

A worked example

Let’s get started by examining our sample churn dataset. Our dataset has 7043 customers and 20 variables. Most of the variables are categorical and can be used to describe attributes about a customer.

Categorical Variables:

– Gender, SeniorCitizen, Partner, Dependents, PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod, Churn

Numeric Variables:

– Tenure, MonthlyCharges, TotalCharges

Here’s a simple exploratory plot to get to know our data- a histogram of monthly charges. We can see how monthly charges are distributed across customers. A large proportion of customers are paying around $20 per month.

If you want to see more of the data, you can download the csv file from here.

In the next post we’re going to talk about survival curves and apply these to our dataset.

Sources:

(1) http://www.superoffice.com/blog/wp-content/uploads/2015/05/reduce-customer-churn.png
(2) http://www.barryanalytics.com/Downloads/Presentations/Survival Analysis.pdf

 

Join the conversation

Your email address will not be published. Required fields are marked *

Comments

Post has no comments.