Come find us! Next week, Wayfair Data Science will be hitting the road. Data scientists and engineers from across our teams will be attending the SIGIR Conference on Research and Development in Information Retrieval in Ann Arbor, Michigan, as well as the International Conference on Machine Learning (ICML) in Stockholm, Sweden.
Here at Wayfair, we are constantly striving to improve the online retail experience. The information retrieval tools we develop and machine learning techniques we employ as a team are essential components in moving that work forward. Here is a quick glimpse into a few of the projects we currently have in motion in those fields.
1. Counterfactual, Position-weighted Evaluation of Recommender Systems via Pareto-smoothed Importance Sampling
The only way to truly know whether a new technology will have a positive impact is to get it in front of users in some form of A/B test. Unfortunately, even the simplest of A/B tests requires a potentially-sunk expenditure of engineering effort and weeks to collect and analyze data. Further, there is only a finite amount of traffic and every test you run diverts precious bandwidth away from all other tests. For this reason, while it is imperative to A/B test every feature, testing is also not something to be treated lightly. This necessitates some form of offline evaluation that is predictive of A/B test performance and can be run quickly enough to meaningfully narrow the search space of possible systems to test.
In the case of recommender systems this is especially difficult. While we can easily measure whether our models recommend what our customers have purchased, that doesn’t really capture what we want to know: would our customers purchase what we would have recommended? We at Wayfair are exploring a state-of-the-art probabilistic approach to evaluation that is designed to answer precisely this counterfactual question. The key insight is that by adding stochasticity to our recommender systems, thereby showing everything to someone, there is a non-zero chance that our customers did in fact see what we would have recommended. We can then utilize a technique known as Pareto-smoothed importance sampling to create a low-bias, low-variance estimator of how our recommender systems would have performed had they been deployed at some time in the past. Further we can control for the natural human tendency to buy the things in a store that are more the visible, be it at eye level in a grocery store or at the top of the page of an eCommerce website.
Information Retrieval & Machine Learning
RecNet is a hybrid recommender system that utilizes both collaborative filtering based information in the form of order co-occurrences and visual information to learn a metric space based on style. We believe the best attribute to recommend products to a user is style, which can be difficult to quantify. RecNet uses a Deep Siamese Convolutional Neural Network to learn a multi-dimensional metric space that maps items that are stylistically similar close together, and others further apart. This is an important distinction from our visual search model which maps items that look visually similar close together. In the RecNet space, two items (say a coffee table and couch) can be mapped close together if they match well, where in the visual search space these items would be mapped far apart. RecNet is an exciting hybrid approach that allows us to personalize style preferences to each user and allows us to recommend products across multiple classes.
Fig. 1: A comparison of the Visual Search and RecNet models. The Visual Search model returns the most visually similar items to the query image, where RecNet returns items from multiple classes that match the style of the query image.
3. Product ReRanker Framework
The Product ReRanker is our first model that algorithmically determines the sort of a personalized SuperBrowse page tied directly to a key performance metric. We use a two-stage candidate generation and reranking framework to generate the best products for a user, and sort them according to a given metric. We use a custom-built deep neural network to learn a metric that measures the probability of Add to Cart, P(ATC), for each product, for each customer. We’re able to show the best possible products to users based on both customer and product information fed into the model. This type of approach can easily be extended to incorporate other features, learn other key performance indices, and can power personalization across the site well beyond SuperBrowse pages. This model will eventually extend beyond the website itself and will be used to personalize advertisements we serve to users across the web.
Fig. 2: The two-stage Wayfair Personalization architecture. First, we use multiple recommendation models to generate relevant candidates for this particular customer. These are passed into a ReRanker component that optimizes for a given KPI, creating the optimal sort of products output by the candidate generation phase to display to the user.
3. Optimization of Content in Marketing Emails
Marketing emails are an important channel that can be used to re-engage customers and drive conversions from our acquired customer base. In order to send the most engaging content to different segments of our list, we use a combination of several approaches. For our most engaged customers, we heavily rely on a combination of content-based and collaborative-based filtering to select the content that is most relevant to the products they have been browsing on our website. For our less engaged customers, we use a contextual multi-armed bandit (MAB) to generate the optimal content for different customer segments. Using a MAB allows us to directly optimize for engagement while balancing exploration and exploitation. The latter is especially important since the possible content changes continuously. This approach has been highly effective as also reported by a recent study:
“Email marketing optimization firm Coherent Path analyzed 100 businesses in the Internet Retailer Top 100 list’s emails to both customers and non- purchasers. Out of those 100 brands, Wayfair bested the competition overall.”
Figure 3: Multi-Armed Bandits are a natural framework for selecting the optimal content, in this case Daily Sales Events. For each slot, we have to select one event, i.e. arm, from a fixed set of possible events. Each event has unknown pay-off, in our case the fraction of customers that will click on the selected event.
3. Room Detector
Computer Vision at Wayfair builds in-house intelligence of Wayfair’s millions of images. Recently, we launched a Room Detection model that categorizes images into room types (bedroom, living room, etc.). While a traditional classification setup performed well, the results were quite noisy: ~35% of data was mislabeled in a dataset of ~1 million images. Moreover, these images came from multiple sources–suppliers, 3D generators, and customers–each one carrying varying amounts of noise. To correct for this, we used an innovative two-pass deep learning strategy for label noise estimation and loss correction, which you can read more about here. This along with transfer learning from a scene detection model improved our model accuracy from ~81% to ~92%.
Pearson Correlation of Embedding Vectors
Fig. 4: Pearson correlation heatmap of the embedding layer (tanh activated) vector representation of a test set.You can see that kids_beds and bedrooms are highly positively correlated, while office and patio_furnitures are highly negatively correlated. As these correlations make intuitive sense, the model passed our sanity check to make sure that the model is learning useful things.
Want to know more?
Swing by our booth at ICML (B05:18) or find us at SIGIR! Data scientists, researchers, and recruiters will be on hand to answer questions and chat. See you in Ann Arbor and Stockholm!