Predict customer lifetime value with machine learning

AI & Customer Intelligence

background_image background_image

It’s every business leader’s dream to predict the future of the market and their company. Using probabilistic models and machine learning artificial intelligence algorithms, predicting the future of sales and customer behavior is possible. 

When you can predict customer lifetime value and buying behavior you can more accurately update your business plans, future budgets, and staff responsibilities to ensure you’re ready to meet the future needs of your customers and grow your business. 

In this article, we will share the basics of how you can use machine learning to predict future customer lifetime value (often abbreviated to CLV, LTV, or CLTV). We will cover:

  • How do you determine the lifetime value of a customer
  • Does LTV include CAC?
  • Predicting a customer’s lifetime value
  • What is customer lifetime value for machine learning
  • Sample machine learning models for CLV
  • How to build your own machine learning model
  • Choosing a customer intelligence solution

How do you determine the lifetime value of a customer?

The average revenue you can expect from a customer is measured as the customer lifetime value. The lifetime value of a customer is calculated based on the following formula:

Customer Lifetime Value (CLV) =

[Average Transaction ($)]   X   [# of Transactions]   X   [Retention time period]

For example, if you are a SaaS company and you sell $20 monthly subscriptions for customers who stay subscribed for an average of 18 months, your CLV is:

$20 (subscription cost) x 12 (transactions per year) x 1.5 years = $360 CLV

Why is lifetime customer value important?

Knowing your CLV is important for business decision-making and getting a better understanding of your company’s financial health and future. 

Customer lifetime value is essential for many areas of your company’s future:

  • Advertising: Knowing your CLV helps you better understand your advertising spend to ensure you have a positive return on ad spend (ROAS) and that you’re not spending more than you’re expected to make from your average customer. 
  • Marketing team: When your marketing teams have data on when a customer typically churns, they can plan timely, personalized campaigns to regain loyalty. 
  • Sales: Your sales team needs to know how much time and effort to spend acquiring customers to ensure they’re not costing the company more than the customer is worth (from a monetary perspective). 
  • Customer support: When your customer experience teams have a deeper insight into your customer’s behavior, they can work to boost customer engagement and satisfaction. Increased customer satisfaction often leads to happier customers who will spend more and stay loyal longer, thus increasing your CLV. 
  • Business operations: Predicting future customer revenues and behavior can help you plan for the future more realistically. For example, if you know the average customer spends $1,000 on your business every year, and you want to increase revenues by $100,000 this year, you need to acquire approximately 100 new customers to meet that goal. 

Does LTV include CAC?

No, the basic LTV (or CLV) formula does not typically include your customer acquisition costs (CAC). It purely measures how much you receive from the customer. However, you can modify the formula to get a better calculation of the lifetime value of your customer after taking into account your CAC. 

Knowing your customer CLV after subtracting your CAC only tells part of the story. To get the full picture, as it relates to the growth projections for your business, you need to understand the ratio of customer revenue vs. acquisition costs. The ultimate goal is to ensure you’re not spending more to get new customers than you can expect over the lifetime of their loyalty to your brand. 

CLV and CAC Formulas

To get your CLV and CAC ratio so you can understand the difference between your spending to acquire a customer and your expected revenue from said customer, first calculate your CLV:CAC ratio using the following formulas:

Customer Lifetime Value (CLV) =

Average Transaction ($)   X   # of Transactions   X   Retention Timeframe


Customer Acquisition Cost (CAC) = 

Total sales and marketing spend for period   /   Customers acquired


Then, divide CLV by your CAC, like this:

Customer Lifetime Value (CLV)  /  Customer Acquisition Cost (CAC)


For example, if you have a customer lifetime monetary value of $500:

  • …and you spend $500 to acquire each customer, you are spending as much as you can expect to make from that customer. This might be considered your break-even point (assuming you don’t have any other business expenses to consider).
  • … or you spend $200 to acquire each customer, you are making a “profit” of $300 per customer acquired. This is considered a positive CLV:CAC ratio. 
  • … or you spend $700 to acquire each new customer, you are paying about $200 more than you can expect to earn from that client. This is considered a loss or a negative CLV:CAC ratio.

Predicting customer lifetime value

The algorithms can help you with quantitative data and values, but don’t help you predict the future, unless you maintain the status quo. Just knowing that you make an average of $500 over the lifetime of one customer isn’t enough without understanding what drives that customer to spend or lose loyalty. 

Learn how to use this data to better understand your customers, so you know how to adjust your business processes, outputs, and interactions to either attract more customers or increase their spending on your brand. 

You can take an educated guess at predicting your customer’s lifetime value in the future. But for a deeper understanding, machine learning can be applied to the data to discover trends and predictions not easily or quickly spotted by labor-intensive, manual data analysis. 

What is customer lifetime value machine learning?

When you use machine learning to analyze and help predict customer monetary value, you can develop a potentially more accurate future picture of the business. By analyzing purchasing behavior of customers and actions or feedback you received during their time as a customer, you can apply algorithms and machine learning to this data to learn what these numbers actually mean for the future. 

Information used for data analysis  

The more transaction data and historical data you have available, the more detailed your probabilistic models can be. Here are a few examples of valuable data to use as live or test sets to calculate CLV for your machine learning algorithm:

  • Invoice number 
  • Invoice date
  • Order date
  • Product code
  • Product price (or unit price)
  • Total order value 
  • Product description
  • Customer ID (or Customer number)
  • Customer location (city, region, country name)

If you have basic customer attribute information such as the above, your AI machine learning algorithm can use test sets of data to identify and create customer groups. Then, based on the customer ID data you have, your customer retention or CLV struggles, and your overall business goals, you can select the best AI models to analyze, interpret, and summarize the trends you need. Different models will give you different results:

Historical approach: Aggregate

This machine learning method aggregates individual customer data from previous or current transactions to predict a single future value. We assume a consistent average customer spend and lifespan with this AI model. 

If your business has a wide range of customer spending (ie: $300 per person to $5000 per person), this method may not work well because the average won’t be an accurate representation of the average spend of your customers. In this case, a cohort approach may be more appropriate.

Historical approach: Cohort

With cohorts, the machine learning AI uses algorithms to group data (aka your customers) into cohorts (customers who exhibit similar characteristics or behaviors). Depending on your goals, your customer segmentation data may include geographical location, first purchase date, or product purchase history. Based on these customer segmentation clusters, you can develop unique CLVs for different groups to get a more accurate picture. 

With this method, you can separate and analyze your high-value customers differently from your low-value customers and distinguish behavior based on region or other demographic data. 

Both aggregate and cohort-based machine learning models are good at analyzing past performance. Still, they are not usually a good indicator of future performance for calculating the lifetime value of your customers unless you maintain the status quo in your business, which is unlikely. 

Probabilistic Models

Using probabilistic models in your machine learning can give better predictions using your past data as a base and probability to fill in the gaps and variables in your CLV formula. In this model, you will likely be separating transactional data and variables from their monetary value. This could include the number of future transactions and future monetary values of your customers. 

Here are five common probabilistic models used in a customer lifetime value calculation using machine learning:

  • Pareto/NBD Model: Determines active customers. If active, it looks at how often they purchase.
  • BG/NBD Model: Predicts the expected number of transactions in a period.
  • MBG/NBD Model: Removes inconsistencies by allowing non-active customers to remain inactive.
  • BG/BB Model: Predict the number of repeat purchases from a single, active customer.
  • Gamma-Gamma Model: Predicts average profit per customer.

Sample machine learning models for CLV

Here are a few samples of companies who have used machine learning to increase the CLV for their future and existing customers:


FabFitFun sells monthly subscription boxes of full-sized products. Each month the customers get a new box full of curated goodies. They had a lower average customer lifetime value than they wanted, as people would unsubscribe after several months. They used machine learning to look deeper into what triggered this churn. 

These AI models analyzed customer surveys and support contacts data to determine the most likely factors causing customers to cancel their subscriptions. Using this data, they amended their processes to remove this factor. As a result, they were able to increase their overall CLV by:

  • Decreasing complaints by 49%
  • Increasing product satisfaction by 250%
  • Decreasing contact volume by 28%
  • Increasing in 5-star ratings by 6%


Instacart is a popular, peer-sourced grocery delivery platform where customers order their groceries online, and a personal shopper picks up their order and delivers it to their house. They struggled with proper routing and assigning the right specialized agents for each request. This led to high support call volumes, unsatisfied customers, and fewer repeat visits. 

They used data from their existing systems, including their Zendesk customer support system. Then, they applied Idiomatic machine learning algorithms to these datasets to uncover what the data was really saying. The Instacart team was able to see exactly where they needed to make changes to address customer service inquiries faster and make their customers happier. 

This led to increasing the average lifetime value of their customer in addition to:

  • More accurately routing support experiences
  • Increasing positive CSAT scores by 35%
  • Saving $445k in annual support costs
  • Receiving 20+ real-time spike alerts every month that they can now address immediately.

How to build your own machine learning model

Building your own machine learning model can be quite complicated and take months of planning, building, and testing. In addition to developing the AI code, you can also expect high ongoing costs to maintain and tweak the algorithms to meet different analytics and business goals. 

The steps to build your own predictive approach machine learning model involve:

  1. Defining your timeframe to calculate your CLV.
  2. Identifying the appropriate features and characteristics to predict CLV in the future. 
  3. Calculate your CLV training data to “teach” the machine learning algorithm.
  4. Build the AI model based on what you learned in the training phase.
  5. Check and verify the machine learning predictions for accuracy and revise the code, or feed it additional training data to develop a more precise prediction for your CLV. 

You may also need to aggregate 6-12 months’ worth of data in your algorithm testing. Your predictive approach may require three months of real data for every six months of predictions you want, to increase the accuracy. But the more data you have, the better. To help get sample data for testing purposes, you can use the UCI Machine Learning Repository public dataset to start. However, whenever possible, use your own data to get more accurate results when testing new algorithms and AI systems.

What should be included in machine learning models?

When building your own machine learning algorithm, you’d normally include any data that’s quantified or has a numerical value. For example:

  • How many support tickets has a specific customer had?
  • How many dollars has a customer spent?
  • How many engagements has your software had?
  • How many complaint tickets has the company received in a specific timeframe?
  • How many new customers have been acquired in a specific timeframe?

Things like customer comments wouldn’t be included in an algorithm as they’re qualitative (ie. don’t have a numerical value). Having said this, you can code qualitative feedback to turn it into quantitative data by assigning a label or numerical value to a specific keyword or sentiment in the comment. 

Cost to develop your own CLV predicting algorithm

Developing a machine learning model like this could cost the equivalent of several engineers working for months, or $100,000+. Implementation in production, on-going maintenance to ensure algorithm model performance and technical support of a self-made predictive AI modeling system probably requires those same engineers fulltime and can be 10X the cost to build the model per year, or $1,000,000 per year.

As you are developing your own machine learning algorithms to calculate and predict your future CLV, you must also account for any human error in the development or interpretation of the data. That’s why we recommend starting with an AI-driven customer intelligence platform that is ready to use immediately and tweaking it to fit your unique needs. 

Choosing a customer intelligence solution

With the Idiomatic customer intelligence solution, implementation typically takes one to two weeks for data preparation and setup. This includes integrating with your pre-existing software and data sources. It’s a ready-to-use product that’ll help you eliminate customer pain points confidently, taking the guesswork out of choosing algorithms and eliminating costly algorithm development in-house.

Not only that, Idiomatic eliminates the need to code qualitative data yourself to include in your machine learning model. Idiomatic unlocks a new set of data such as user comments, issues, and open ended feedback to be able to easily use in the model. Idiomatic categorizes customer feedback based on your business’s unique issues, turning feedback into meaningful, actionable insights.

Check out this ROI calculator to see how valuable a ready-to-use system like this can be for your business. Then, we’d be happy to chat with you about your customer retention, and CLV struggles to see how the Idiomatic solution can help you