TV advertising offers unique advantages over more transactional forms of online marketing. It gives advertisers the chance to garner consumer attention over an extended period of time, and allows advertisers to tell stories and project values that will stick in the public’s mind. As a household brand and the destination to shop everything for your home, Wayfair invests heavily in TV as an indispensable part of our advertising portfolio.
For all the benefits, TV advertising is very expensive, which makes it one of the most important channels to optimize. At the same time, TV optimization is particularly challenging. For online advertising channels, say on Google or Facebook, we are able to track user behavior individually. TV on the other hand, as with other offline channels, does not generally allow such detailed understanding of individual behavior, making it difficult to gauge response. How do we achieve this at Wayfair, and how are we able to optimize our marketing spend towards return on investment?
On a technical level, the challenge lies in determining whether a given visitor coming to our site arrived due to the influence of TV, or whether they would have visited anyway. As it is impossible to do this in a deterministic way, we rely on a subset of visits that we can measure in the short term in order to make inferences about TV marketing performance as a whole.
Once we have identified the visits that we can measure, we need to then understand cost and revenue on a per-visitor basis. Once we have indicators of cost and revenue, we can then make systematic decisions about the most efficient allocation of spend over, e.g., selecting different networks, different parts of the day, or different messaging to maximize revenue and minimize costs.
To measure uplift in visits, we make use of the synchronized nature of traditional TV (sometimes called “Linear TV”). Contrary to a digital Display banner or a text ad on Google, which are shown to just one individual at a time, a large TV audience will be exposed to a TV ad at the same time. Any reaction from this big group of people will therefore also take place more or less at the same time – and that provides a measurable signal.
Indeed, the minute-by-minute count of visitors who arrive at the Wayfair homepage, either via a branded search or by directly visiting wayfair.com, shows clear spikes that are locked in time to when our TV ads are played on air. Even though most people who watch an ad will not respond rapidly by visiting our site, enough of them do visit that we can use the size of this spike in visits to assess ad performance in terms of rapid response.
The first step to counting the extra visitors driven by an individual ad is to model a baseline of site traffic: this is the number of visitors that are expected in the absence of any TV advertising. To get a minute-by-minute count of TV driven spike visitors, we subtract the baseline from the full minute-by-minute visit signal for 3-5 minutes after a TV ad is aired. This spike visitor signal is larger with both higher ad cost and larger TV audience size, which is strong evidence that this method is indeed measuring a signal linked to TV ad response.
Once we have derived this count of spike visitors, developing a business metric for cost is straightforward: we know what we pay for any given TV spot, and dividing this cost by the number of spike visitors driven by the spot gives a cost per spike visitor (CPSV). This metric allows comparison of networks, time of day, creative ad content, or other dimensions in terms of the magnitude of response we get out of TV, and allows us to maximize spike visitor traffic within an allocated budget.
However, cost alone does not take into consideration the return we ultimately receive from TV advertising. Even worse, cost alone might set the wrong incentive – guiding us to purchase the lowest cost traffic, even if those visitors are less likely to purchase. Thus, if we are looking to make money, cost alone is an incomplete metric, and we need to turn to revenue to get a more complete picture.
To generate a metric based on revenue from TV-driven visitors we can use the same spike visitors as in the CPSV calculation. The question now becomes: how much are these spike visitors worth? In order to gauge this, we could follow potential customers who arrive as these TV driven “spike visitors” forward in time to any later purchase, adding up subsequent revenue.
The difficulty in this approach lies in isolating the TV-influenced visitors from the baseline visitors who would have visited anyway: recall that we have no way of telling deterministically whether a given visitor coming to our site arrived due to the influence of TV, or whether they would have visited even without TV as part of the “natural” baseline traffic.
To disentangle the two, we use a synthetic control method, comparing revenue generated by visitors who arrive on site during a TV spike to synthetic estimates of what the baseline revenue would have been in the absence of TV. (For another example example of the synthetic control method, see our blog post on geographic splitting optimization techniques).
For the synthetic control, we use two levels of resampling to generate spike revenue estimates. As a first step, we follow all of the visitors who arrive immediately after an individual TV ad forward in time for several weeks, adding up all the revenue they generate over a fixed time period. Next, we look for times that were not TV influenced: we turn to nearby minutes in which we did not have any TV ads on air, and take random samples of visitors from these minutes. These “baseline” samples contain the same number of baseline visitors predicted during that particular TV spot – these represent the counterfactual control, ie. visitors who would have arrived at our site regardless of our TV advertising. Just as with the TV spike visitors, we now follow each of these synthetic control samples forward in time to calculate many matched control revenue estimates for each individual ad.
For each synthetic control sample, we calculate the difference between the revenue that arrived during the TV spike, and the control revenue, generating a collection of estimates for revenue driven by that individual ad. Taking many samples instead of a single deterministic measure from all nearby visits yields a measurement of variation in performance.
The resulting revenue distribution at an individual spot level is typically quite broad, reflecting a high level of uncertainty about how much revenue any individual TV advertisement drives. This is typical for any marketing activity in ecommerce, and it is due to the sparsity of conversion events relative to the total count of visits to our site. Even a giant TV-generated spike in visits from an expensive, high-impression ad may only have a handful of additional TV generated purchases over the next few months. To derive insights from an estimate as noisy as this, we need to aggregate.
… and more Sampling
A second layer of resampling on an aggregate level gives us a more reliable read on revenue driven by particular types of TV ads. The approach here is to group together many ads along a dimension that is relevant for how we purchase TV advertising. For example, consider a group of 100 ads on a particular network. We gather up all the spot-level resampled revenue estimates for all of these 100 spots. From this group, we now bootstrap many samples of size 100, adding up the revenue estimates generated in our first level of sampling. If you imagine each sample in the first step as representing revenue for one synthetic “TV spot”, we can imagine this step as estimating the revenue performance of a synthetic TV campaign for ads of a particular type.
Dividing by total spike visits driven by these 100 ads gives revenue per spike visitor (RPSV), a metric that reflects the revenue performance of this group of ads overall. In addition to the average RPSV, we rely on the overall distribution of resampled RPSV, which provides an understanding of our level of certainty about the revenue value. Intuitively, we can be more certain of the RPSV value for ad groups with very narrow confidence bounds than those with broader distributions.
We now have measures of cost and revenue, normalized to the same underlying value of spike visits from TV advertisement. The resulting revenue-to-cost ratio, RPSV/CPSV, can be used to guide which portions of our TV ad campaign pay back over time, with higher ratios indicating higher potential return on investment.
As with any model, this approach has limits and presents challenges in implementation. We are never going to do a perfect job counting the extra spike visitors driven by TV, as the input signal is noisy due to fluctuations in the underlying visit stream. Practically, this means the method works best for TV ads that reliably drive “big enough” visitor spikes. We therefore cannot measure all of our TV advertising equally well. Fortunately, the portion of our ad buy most likely to have measurable revenue from spike visits is also the advertising that is most expensive, and thus is most in need of measuring return on investment to understand whether higher costs are justified.
We should also be careful about interpreting the cost and revenue metrics our model returns. TV spike visitors are of course only a small portion of all the people who are truly influenced by TV, since only visitors who arrive within a few minutes of a TV ad airing are counted. In the metrics described here, however, TV spike visitors are assigned all ad costs, while they are being credited with only a portion of TV-generated revenue. For this reason, a metric like this is not directly comparable to methods that track subsequent revenue more fully.
The methods described here are only able to measure short-term effects using a subset of TV influenced visits. For instance, our spike model does not consider the more long-term effect of increased brand awareness driving more organic traffic to our site, nor does it measure any halo effect of TV-generated awareness on the responses we see through other marketing channels. Thus, this method can never provide a measure of total ROI for TV. However, it allows us to manage our campaign optimization day-to-day, while we carefully monitor complementary longer-term metrics captured through additional models to assess the success of our TV campaign overall.
Previously, we were managing short-term TV optimization using purely the cost per spike visitor metric. This made some of our more expensive TV advertising look like poor contributors to advertising efficiency. However, once we started considering revenue per spike visitor, we were able to see that some high-cost advertising actually delivers positive returns.
By considering the RPSV/CPSV metric, we have been able to systematically expand our campaign on networks we previously considered inefficient. Crucially, we have been able to do this while preserving our overall efficiency for longer-term incremental TV revenue, as measured with our independent longer-term model: our additional spend translates one-to-one into substantial additional revenue.
Special thanks go to Tulia Plumettaz, Thomas Krausse, Courtney Lawrie, Emma Atwood, Jessica McDermott, Danielle Lozier, and Daniela Venturini for their help with this project.