Multi-Touch Attribution
Abstract
We are living in the digital economy and a customer often exposed to promotional ads on different digital channels like Facebook and google search. If a customer purchased a product after multiple exposures on different channels, then how can we estimate each channel's individual contribution towards that conversion, so that we can assign the marketing budget accordingly?
Multi-touch attribution is a set of methods and modeling techniques that tries to estimate the digital channel's individual contribution leveraging historical data of customer touchpoints and machine learning methodologies. I started with searching for relevant research papers in this domain, performed a literature review, and created a framework. I selected a publicly available dataset for my work with around 2 million records. I experimented with a lot of models and finally settled with three: Markov chains, Survival analysis, and RNN with attention. Conversion prediction accuracy was used as a proxy for evaluation and model selection but the A/B test should also be done before deploying this model in production.
Overview
MTA pertains to the question of how much the marketing touchpoints a user was exposed to, contributes to an observed action by the consumer. Understanding the contribution of various marketing touchpoints is an input to good campaign design, to optimal budget allocation and for understanding the reasons for why one campaign worked and one did not. Wrong attribution results in misallocation of resources, inefficient prioritization of touchpoints, and consequently lower return on marketing investments. Consequently, having a good model of attribution is now recognized as critical for marketing planning, design and growth.
The problem of attribution is not new. It arises in traditional advertising channels such as television and print. However, online channels offer a unique opportunity to address the attribution problem, as advertisers have disaggregated individual level data which were not previously available. Given the lack of disaggregate data, the marketing literature has focused primarily on marketing mix, which perform inter-temporal analysis of marketing channels but fail to provide insights at an individual customer level. Granular online advertising data can be used to build rich models of consumer response to online ads.
Any business that’s actively running marketing campaigns should be interested in identifying what marketing channels drive the actual conversions. As the array of platforms on which businesses can market to their customers is increasing, and most customers are engaging with your content on multiple channels, it’s now more important than ever to decide how you’re going to attribute conversions to channels. A 2017 study showed that 92% of consumers visiting a retailer’s website for the first time aren’t there to buy.
As marketing moves more and more towards the consumer driven side of things, identifying the right channels to target customers has become critical for companies. This helps companies optimize their marketing spends and target the right customers in the right places.
Because the various touchpoints can interact in complex ways to affect the final outcome, the problem of parsing the individual contributions and allocating credit is a complex one. Given the complexity, many firms and platforms use rule-based methods such as last-touch, first-touch, equally-weighted, or time-decayed attribution. Because these rules may not always reflect actuality, modern approaches propose data-driven attribution schemes that use rules derived from actual marketplace data to allocate credit.
More often than not, companies usually invest in the last channel which customers encounter before making the final purchase. However, this may not always be the right approach. There are multiple channels preceding that channel which eventually drive the customer conversion. The underlying concept to study this behavior is known as ‘multi-channel attribution modeling.’
Problem Statement: To quantify the impact of different media channels on sales volume
Proposed Solution: Data-driven Multi-touch attribution modeling
In Scope:
- Digital Channels including social media ad, paid search ad, display ad, email.
- Fixed time period/window: week, month, campaign duration.
- Models: Markov chain models, Survival analysis models, Attention based Recurrent neural net (RNN) models
Out of Scope: Offline channels (e.g. TV, Radio, Print, OOH, Direct marketing)
List of variables and Data dictionary
Preprocessing
Dataset 1
Dataset 2
Criteo is a pioneering company in online advertising research. They have published this dataset for attribution modeling in real-time auction based advertising. This dataset is formed of Criteo live traffic data in a period of 30 days. It has more than 16 million impressions and 45 thousand conversions over 700 campaigns.
The impressions in this dataset may derive click actions so each touch point along the user action sequence has a label of whether a click has occurred, and the corresponding conversion ID if this sequence of touch points leads to a conversion event.
Exploratory Data Analysis
Modeling Approaches
Heuristic approaches to Multi-touch attribution
Last Touch Attribution (LTA)
- ****As the name suggests, Last Touch is the attribution approach where any revenue generated is attributed to the marketing channel that a user last engaged with.
- ****While this approach has its advantage in its simplicity, you run the risk of oversimplifying your attribution, as the last touch isn’t necessarily the marketing activity that generates the purchase.
First Touch Attribution (FTA)
- The revenue generated by the purchase is attributed to the first marketing channel the user engaged with, on the journey towards the purchase.
- Just as with the Last Touch approach, First Touch Attribution has its advantages in simplicity, but again you risk oversimplifying your attribution approach.
Linear Attribution
- In this approach, the attribution is divided evenly among all the marketing channels touched by the user on the journey leading to a purchase.
- This approach is better suited to capture the trend of the multi-channel touch behavior we’re seeing in consumer behavior. However, it does not distinguish between the different channels, and since not all consumer engagements with marketing efforts are equal this is a clear drawback of this model.
Data-driven approaches to Multi-touch attribution
Markov Chain model
Basics
What is Markov Chain?
- A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. These events are also known as states. These states together form what is known as State Space.
- The probability of next event or next state depends only on the present state and not on the previous states. This property of Markov Chain is called Memoryless property. It doesn’t care what happened in the past event and focuses only on the present information to predict what happens in the next state.
- A Markov Chain provides Information about the current state & Transition probabilities of moving from one state to another. Using the above two information, we can predict the next state. Final State = Initial State * Transition Matrix.
- So, if we multiply the Initial state matrix by the transition matrix, we obtain the first state matrix. If the first state matrix is multiplied by the transition matrix we obtain the second state matrix
What is Transition Matrix?
- The transition probability (the probability that a customer will go from one Sequence A to Sequence B) is given by:
An Example showing how Markov chain and Transition matrix works
Assume that we need to know what the state the customer is after 6 months of launching the product. There are 200,000 customers which could be in any of the four states – Awareness, Consideration, Purchase, No Purchase. Final State of Customers = Initial State Vector * Transition Matrix
Model results
User-conversion simulation using Transition matrix
Problem Statement: Let’s assume that we are going to attract 1000 visits from Paid Search and we want to model how many conversions we will obtain?
Time gap analysis & Rule-based customer value inference
Inference from the above 2 Views
- **So if Person A watched our Facebook ad for the first time 25 days ago and no contact since then – Customer is fruitless for us. Customer Value = Low**
- **If Person B also watched our Facebook ad for the first time 25 days ago and visited website 14 days ago – Customer is fruitless for us. Customer Value = Low**
- **If Person C also watched our Facebook ad for the first time 25 days ago and visited website 7 days ago – customer is still fruitful for us. Customer Value = Mid-High**
Additive Hazard Model
Model Comparison on conversion Probability
References
- https://www.stat.auckland.ac.nz/~fewster/325/notes/ch8.pdf
- https://towardsdatascience.com/marketing-analytics-through-markov-chain-a9c7357da2e8
- http://www3.govst.edu/kriordan/files/ssc/math161/pdf/Chapter10ppt.pdf
- http://math.furman.edu/~tlewis/math13/markov1.pdf
- Markov chains concept - https://analyzecore.com/2016/08/03/attribution-model-r-part-1/
- Slides: https://www.slideshare.net/adavide1982/markov-model-for-the-multichannel-attribution-problem
- PDF Report: https://aaltodoc.aalto.fi/bitstream/handle/123456789/13898/master_Rentola_Olli_2014.pdf?sequence=1
- Advertising Attribution Modeling in the Movie Industry - https://mc-stan.org/events/stancon2017-notebooks/stancon2017-lei-sanders-dawson-ad-attribution.html
- http://datafeedtoolbox.com/attribution-theory-the-two-best-models-for-algorithmic-marketing-attribution-implemented-in-apache-spark-and-r/
- https://www.bounteous.com/insights/2016/06/30/marketing-channel-attribution-markov-models-r/?ns=l
- Google Analytics Multi-channel funnels: https://support.google.com/analytics/answer/1191180?hl=en
- R Package - https://github.com/cran/ChannelAttribution
- Original dataset from ‘Criteo - online advertising research company’
- https://sequentpartners.com/wp-content/uploads/2017/01/ADMAP-DECEMBER-2016-.pdf
- https://www.visualiq.com/resource-center/newsletter/marketing-modeling-attribution-together
- https://www.iquanti.com/wp-content/uploads/2018/10/iQuantiINSIGHTS_Hybrid-Approach-to-Attribution-Modeling_Whitepaper.pdf
- https://pdfs.semanticscholar.org/9ecf/f16d2044a2a296d78262fb083ef29296a445.pdf
- http://eprints.bbk.ac.uk/16166/1/CIKM2016_582.pdf
- http://www0.cs.ucl.ac.uk/staff/w.zhang/rtb-papers/data-conv-att.pdf
- https://arxiv.org/pdf/1808.03737.pdf
- https://arxiv.org/pdf/1809.02230.pdf
- https://arxiv.org/pdf/1902.00215.pdf
- https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2111-2018.pdf
- https://storage.googleapis.com/pub-tools-public-publication-data/pdf/de1c3ab14fd52301fb193237fdffd45352159d5c.pdf
- https://pure.tue.nl/ws/files/96724049/Master_Thesis_Robbert_Alblas.pdf
- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45766.pdf
- https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45331.pdf
- https://www.nielsen.com/us/en/insights/news/2019/methods-models-a-guide-to-multi-touch-attribution.html
- https://www.r-bloggers.com/marketing-multi-channel-attribution-model-with-r-part-2-practical-issues/
- http://delivery.acm.org/10.1145/3320000/3313470/p1376-nuara.pdf?ip=122.15.228.208&id=3313470&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&acm=1559618525_b1614c5976fcf5f1a9328b3050f56256
- https://arxiv.org/ftp/arxiv/papers/1804/1804.05327.pdf
- https://ahsanijaz.github.io/2016-10-21-ChannelAttribution/
- http://184pc128.csie.ntnu.edu.tw/presentation/17-02-21/A%20Probabilistic%20Multi-Touch%20Attribution%20Model%20for%20Online%20Advertising_slide.pdf
- http://www.saying.ren/slides/deep-conv-attr.pdf
- https://dl.acm.org/citation.cfm?id=3313470
- http://gen.lib.rus.ec/scimag/?q=Mapping+The+Customer+Journey%3A+Lessons+Learned+From+Graph-Based+Online+Attribution+Modeling
- https://github.com/rk2900/deep-conv-attr
- https://github.com/eeghor/mta
- https://github.com/LouisBIGDATA/Channel-Attribution-Modeling-in-Marketing
- https://github.com/cran/ahaz