Landing Page Personalization at Expedia Group
Landing Page Personalization at Expedia Group
Landing Page Personalization at Expedia Group
Pavlos Mitsoulis-Ntompos, Dionysios Varelas, Travis Brady, J. Eric Landry, Robert F. Dickerson,
Timothy Renner, Chris Harris, Shuqin Ye, Abbas Amirabadi, Lisa Jones, Javier Luis Cardo
{pntompos,dvarelas,tbrady,johlandry,rdickerson,trenner,chrharris,sye,aamirabadi,ljones1,jlcardo}@expediagroup.com
Expedia Group
(4) Traveler has selected the destination and date, and has a of our home and landing pages contain carousels showing recent
shortlist of properties (Shop), searching and browsing behavior as well as separate recommen-
(5) Traveler has selected the destination and property (or has a dation carousels based on said recent behavior. We proved the
shortlist of either / both) and dates (Shop). hypothesis that recent activity is correlated with shortlisted prop-
erties and/or destinations using online experiments. Typically we
choose the simplest possible solution to start as we believe that sim-
ple solutions can be very powerful in terms of providing better user
experience and provide baselines for future ML based approaches.
1.1.2 Shop. In this phase, travelers have already shortlisted a num- 2.1.2 Dataset and Application. For the experiments, anonymized
ber of properties and destinations. Many of them have even se- clickstream data is collected for millions of travelers and properties.
lected their destination, and they just need to make decision be- Specifically, for each traveler 𝑡𝑖 we create training instances by col-
tween a handful of properties. lecting all the previously seen properties into sequences 𝑝 𝑗1 , ..., 𝑝 𝑗𝑚𝑖 .
In order to facilitate travelers in the Shop phase, we initially im- From those training instances we slide a window in order to create
plemented and deployed a couple of heuristic algorithms. Many training examples [10].
KDD ’20, August 22–27, 2020, San Diego, CA Mitsoulis-Ntompos, Varelas, Brady, Landry, Dickerson, Renner, Harris et al.
4.1 History Uber 3 , and FBLearner 4 from Facebook. We also developed a ma-
Initially at both Vrbo and Hotels.com, machine learning models chine learning platform that serves models in real-time, stream-
were directly integrated with the product’s codebase. At the time ing and batch contexts. We decided to favor a microservice ap-
there were few machine learning models in production and there- proach since the systems ML is integrated into are mostly Java
fore specialized infrastructure wasn’t economical. Absent machine or Scala based. The microservice approach also solves the scaling
learning specific infrastructure, embedding the model code directly / system requirements mismatch, as the ML model can be scaled
in the backend software stack is simpler. independently from the integrating system. For the purposes of
There are a number of drawbacks to this approach that com- recommendation systems there are three relevant patterns to dis-
pound complexity as model updates become more frequent, how- cuss: real-time synchronous, real-time precomputed (offline), and
ever. Firstly, any model retraining or update would require rede- real-time precomputed (online). These are diagrammed in Figure 6.
ploying the entire stack. For home and landing pages, this is not Here real-time refers to how the inferences are accessed, meaning
trivial. Those services are critical and must have near-perfect up- large numbers of small one-at-a-time requests as opposed to bulk
time, or users will be unable to search for properties and therefore requests of the entire dataset (via SQL or Apache Spark 5 , for ex-
unable to book. This slows down the speed of iteration and exper- ample). The platform was built in stages, usually driven by data
imentation, which is critical for machine learning success. scientist / software engineering integration requirements, rather
Machine learning models are very different from traditional back- than dictated top-down all at once. The result is a series of loosely
end systems. The code for training ML models is usually written coupled components that were developed based on real-world use
in Python, Scala or R, while most of the backend systems at Ho- cases.
tels.com and Vrbo are built in Java. This necessitates a model “hand- 4.2.1 Real-time Synchronous Inference (RTSI). This is a straight-
of” from one runtime to the next, and makes apples-to-apples test- forward request / response pattern. The caller sends the payload
ing extremely difficult, particularly if there’s data processing code containing the model features, then the service invokes the model
required. Compute and memory requirements are also significantly in-memory and performs the inference, sending the result back to
higher than traditional backend systems. This means that adding the caller. This pattern is very simple and follows a RESTful way
machine learning directly into a backend system will likely change of calling the prediction logic of models. Deployment is entirely
the requirements for the whole system, even if it’s only needed for self-service, and requires only a small amount of code on the data
the ML model. Monitoring machine learning systems is more chal- scientist’s part for instrumentation and packaging. It can support
lenging as well. All the usual monitoring for uptime and latency ap- any Python, Java or Scala machine learning framework.
ply, but it’s also important to monitor predictions (as noted in [2]) The obvious disadvantage to this pattern is clear when the model
as a means of detecting bias, train-test skew, and model drift. Fi- requires a significant amount of computational power, or when
nally, the skill sets between data scientists and backend developers the required latency is low. Since the model is being invoked per
are also very different. Managing software in production requires a request, latency can be quite high depending on the model.
completely different skill set than building a powerful model. With
the model integrated directly into the backend, the data scientists 4.2.2 Real-time Precomputed (Offline) (RTPOf). If the inference scores
need to know a lot about that system, and the backend developers can be precomputed in an offline environment, they can be loaded
must understand the computational and memory requirements of into a key-value store as long as there’s a suitable key. Examples
the model. of common keys at Vrbo and Hotels.com are property ID, traveler
All of these differences slow down iteration and experimenta- ID and destination ID. Then callers would only need to know the
tion. Redeploying the entire backend stack for a critical system is key to retrieve the scores. This transforms what was a complicated
slow and risky, even with appropriate application infrastructure. inference per-prediction into a lookup, reducing the latency by at
Translating the model from one runtime to another takes time, and least an order of magnitude.
testing is very tedious. The single most important thing a data sci- This approach has drawbacks too. If the key space has a very
entist can do with a model to increase it’s economic impact is to high cardinality then precomputation and storage become imprac-
test it online as much as possible, yet at the start of these efforts tical thus necessitating a move to the aforementioned synchronous
that was extremely difficult. This necessitated an investment on the inference. Another drawback is the frequency of updates to the
part of Vrbo and Hotels.com to develop specialized infrastructure key-value store. In many cases a simple daily or hourly update will
specifically for deploying and monitoring ML models. suffice, however some use cases require the precomputed values to
be updated continuously as new data arrives in the system. Updat-
ing traveler “profiles” based on their searches as they search on the
4.2 Machine Learning Infrastructure site is an example of such a use case.
Most companies with significant machine learning requirements
and products have a platform to accelerate ML development and 4.2.3 Real-time Precomputed (Online) - (RTPOn). By combining
deployment. Some concrete examples include Tensorflow Extended the real-time synchronous services with the key-value store, we
(TFX) [1] from Google, Bighead from Airbnb 2 , Michaelangelo from are able to listen to a stream of values (via Apache Kafka6 ) and
3 https://eng.uber.com/michelangelo/
4 https://engineering.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-
backbone/
2 https://databricks.com/session/bighead-airbnbs-end-to-end-machine-learning- 5 https://spark.apache.org/
platform 6 https://kafka.apache.org/
Landing Page Personalization at Expedia Group KDD ’20, August 22–27, 2020, San Diego, CA
Real-time Real-time
Real-time
Precomputed Precomputed
Synchronous
(Offline) (Online)
Batch Model
Job service
Figure 6: (a) Real-time synchronous, (b) real-time precomputed (offline), (c) real-time precomputed (online)
perform the predictions as the inputs arrive, rather than when the our methods for personalizing landing pages and the home page
predictions are needed. The assumption of course is that the pre- are always evolving. We present ongoing and future work in this
dictions won’t be needed immediately, but some time after the fea- section.
tures become available on the stream.
When the model input (including key for lookup) arrives on the
stream, a stream processor service issues a synchronous request to
the model service to retrieve the score. Then the stream processor
inserts the score into the key-value store, where going forward it 5.1 Content Augmented Recommendation
will be available at low latency without having to recompute the System - CaRS
predictions. This results in continual updates to the key-value store Building upon the work done on Property Embeddings (section 2.1)
at high frequency. Should the number of input events become large and inspired by [4] we started developing a new recommendations
for a short period of time, it is possible to scale the model service to framework in order to incorporate additional meaningful signals
handle the traffic, or if a delay is acceptable, wait until the volume such as price or destination attributes (historic, romantic, etc.).
of stream events returns to normal levels. The aim of CaRS is to provide a recommender system that can
be used for both property and destination recommendations. We’re
Table 3: Mapping of Models to Inference Type developing CaRS in a way that allows us to generate both person-
alized (traveler to item) and non-personalized (item to item) rec-
Model - Use Case Inference Type ommendations while maintaining only one model.
For the rest of this subsection, an item will refer to both prop-
Hotels.com “Similar Properties” RTSI erties and destinations. We consider a destination “booked” if a
Hotels.com “Recommended destinations” RTPOf traveler booked a property within the destination.
Hotels.com “Similar destinations” RTPOf The model uses sequences of clicked items as context and booked
Hotels.com “Search box recommendation” RTPOf items as targets from traveler sessions. Travelers are represented
Vrbo “Recently viewed properties” RTPOn as an aggregation (e.g. average) of the features of the items they
Vrbo “Recent activity” RTPOn have clicked on. We train a neural network to generate a score
Vrbo “Recommended for you” RTSI which represents how likely travelers are to book an item given
Vrbo “Based on your recent searches” RTSI the previous items they have clicked on. The function learned by
the neural network can be used both with multiple items as input
so as to personalize results and with a single item if no personal-
ization is desired.
4.3 Productivity We use a pairwise loss; for each positive example we draw neg-
The combination of straightforward deployment, standardized in- ative examples and learn the weights such that a higher score is
frastructure toolkits, flexible ML framework support, and separa- assigned to positive examples than the negative ones. Inspired by
tion from the integrating system has significantly improved the [3], [13] we used our pretrained embeddings in order to identify
iteration speed (and therefore effectiveness) of data scientists at similar items that weren’t viewed by the travelers. This allowed
Vrbo and Hotels.com. Since the development and adoption of the us to improve the performance of the model with negative sam-
ML platform, the number of ML models deployed has grown ap- pling based on the already estimated conditional probabilities pro-
proximately linearly. vided by the embeddings. By sampling negatives near the decision
boundary we can improve ranking metrics like NDCG.
5 DISCUSSION AND FUTURE WORK Combining content features with decision boundary negative
At Vrbo and Hotels.com, we are always looking for new and better sampling, we saw an improvement in offline metrics like NDCG
ways to lead travelers to their dream vacation. Correspondingly, and recall when we compared CaRS to CBoW.
KDD ’20, August 22–27, 2020, San Diego, CA Mitsoulis-Ntompos, Varelas, Brady, Landry, Dickerson, Renner, Harris et al.