0 / 1162

Artwork Personalization at Netflix

Netflix Technology Blog

Netflix Technology BlogFollow

Dec 8, 2017 · 13 min read

By Ashok Chandrashekar, Fernando Amat, Justin Basilico and Tony Jebara

For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front each of our members at the right time. With a catalog spanning thousands of titles and a diverse member base spanning over a hundred million accounts, recommending the titles that are just right for each member is crucial. But the job of recommendation does not end there. Why should you care about any particular title we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we convince you that a title is worth watching? Answering these questions is critical in helping our members discover great content, especially for unfamiliar titles. One avenue to address this challenge is to consider the artwork or imagery we use to portray the titles. If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or TV show. If we present that perfect image on your homepage (and as they say: an image is worth a thousand words), then maybe, just maybe, you will give it a try. =This is yet another way Netflix differs from traditional media offerings: we don’t have one product but over a 100 million different products with one for each of our members with ==personalized recommendations== and ==*personalized visuals*==.=

多年来,Netflix个性化推荐系统的主要目标,是为用户在合适的时间推荐合适的视频。Nteflix 网站上每个分类页面下有成千上万部影片,用户账号达数十亿,因此为每个成员推荐合适的视频至关重要。但推荐系统能做到的不仅是这些。怎样让用户对你推荐的视频感兴趣?怎样让一个陌生的视频激起用户的兴趣?什么样的视频值得关注?回答这些问题对于帮助用户发现好的内容至关重要,特别是对于不熟悉的视频。

视频的封面:为视频设计独立的海报或图像,是可以轻松地解决这个问题的方法之一。如果一张封面对用户有足够的吸引力,比如用户熟悉的演员、让人肾上腺激素飙升的汽车追逐场面,或者一部电影或电视节目精髓的戏剧性场景等信息(一张图片胜过千言万语),就会诱惑用户点开视频。这是 Netflix 与传统媒体产品不同的一点:我们提供的不是一个产品,而是一个千人千面的产品。就算一亿个用户进来,看到的也完全不同,我们为每个用户提供个性化推荐和个性化的视觉效果。

A Netflix homepage without artwork. This is how historically our recommendation algorithms viewed a page.


In previous work, we discussed an effort to find the single perfect artwork for each title across all our members. Through multi-armed bandit algorithms, we hunted for the best artwork for a title, say Stranger Things, that would earn the most plays from the largest fraction of our members. However, given the enormous diversity in taste and preferences, wouldn’t it be better if we could find the best artwork for each of our members to highlight the aspects of a title that are specifically relevant to them?


Artwork for Stranger Things that each receive over 5% of impressions from our personalization algorithm. Different images cover a breadth of themes in the show to go beyond what any single image portrays.


As inspiration, let us explore scenarios where personalization of artwork would be meaningful. Consider the following examples where different members have different viewing histories. On the left are three titles a member watched in the past. To the right of the arrow is the artwork that a member would get for a particular movie that we recommend for them.


Let us consider trying to personalize the image we use to depict the movie Good Will Hunting. Here we might personalize this decision based on how much a member prefers different genres and themes. Someone who has watched many romantic movies may be interested in Good Will Hunting if we show the artwork containing Matt Damon and Minnie Driver, whereas, a member who has watched many comedies might be drawn to the movie if we use the artwork containing Robin Williams, a well-known comedian.

我们为电影《心灵捕手》设计个性化封面的根据是每个用户对不同类型和主题的偏好。对于看过许多浪漫爱情电影的人,如果他的推荐图片中包含马特·达蒙(Matt Damon)和米妮·司各德(Minnie Driver)的信息,可能他会对《心灵捕手》感兴趣,而如果是对于看过很多喜剧片的用户,我们在推荐图中包含知名喜剧演员罗宾·威廉斯(Robin Williams)的信息,吸引他的几率可能更大。

In another scenario, let’s imagine how the different preferences for cast members might influence the personalization of the artwork for the movie Pulp Fiction. A member who watches many movies featuring Uma Thurman would likely respond positively to the artwork for Pulp Fiction that contains Uma. Meanwhile, a fan of John Travolta may be more interested in watching Pulp Fiction if the artwork features John.

另外,个性化封面对喜欢不同演员的用户会产生什么影响呢?以《低俗小说》为例,一位观看过很多乌玛·瑟曼(Uma Thurman)出演电影的用户可能会对包含乌玛(Uma)信息的图片反应更为积极。同理,John Travolta 的粉丝更可能因为图像中包含 John 而被这部影片吸引。

Of course, not all the scenarios for personalizing artwork are this clear and obvious. So we don’t enumerate such hand-derived rules but instead rely on the data to tell us what signals to use. Overall, by personalizing artwork we help each title put its best foot forward for every member and thus improve our member experience.


Challenges 挑战

At Netflix, we embrace personalization and algorithmically adapt many aspects of our member experience, including the rows we select for the homepage, the titles we select for those rows, the galleries we display, the messages we send, and so forth. Each new aspect that we personalize has unique challenges; personalizing the artwork we display is no exception and presents different personalization challenges. One challenge of image personalization is that we can only select a single piece of artwork to represent each title in each place we present it. In contrast, typical recommendation settings let us present multiple selections to a member where we can subsequently learn about their preferences from the item a member selects. This means that image selection is a chicken-and-egg problem operating in a closed loop: if a member plays a title it can only come from the image that we decided to present to that member. What we seek to understand is when presenting a specific piece of artwork for a title influenced a member to play (or not to play) a title and when a member would have played a title (or not) regardless of which image we presented. Therefore artwork personalization sits on top of the traditional recommendation problem and the algorithms need to work in conjunction with each other. Of course, to properly learn how to personalize artwork we need to collect a lot of data to find signals that indicate when one piece of artwork is significantly better for a member.

Netflix 还通过算法对网站做了很多个性化处理,以提高会员体验,包括主页列表选择、列表的标题、展示的图片、发送的消息等等。对于我们来说,每一个方面的个性化处理都是独特的挑战,个性化封面也不例外。其中,图像个性化处理的挑战之一,是每个位置视频的封面只能有一张。相比之下,典型的推荐设置可以向会员提供多个选择,之后我们可以从会员的选择中了解他们的偏好。这就是个先有鸡还是先有蛋的问题。会员到底是因为个性化封面吸引他,点击的这个视频,还是因为本来就想看这个视频,和封面无关。因此,个性化封面推荐应该结合传统方法与算法才能奏效。当然,为了正确学习封面个性化,我们需要收集大量的数据,来找到能表明哪个封面对于用户更合适的信息。

Another challenge is to understand the impact of changing artwork that we show a member for a title between sessions. Does changing artwork reduce recognizability of the title and make it difficult to visually locate the title again, for example if the member thought was interested before but had not yet watched it? Or, does changing the artwork itself lead the member to reconsider it due to an improved selection? Clearly, if we find better artwork to present to a member we should probably use it; but continuous changes can also confuse people. Changing images also introduces an attribution problem as it becomes unclear which image led a member to be interested in a title.


Next, there is the challenge of understanding how artwork performs in relation to other artwork we select in the same page or session. Maybe a bold close-up of the main character works for a title on a page because it stands out compared to the other artwork. But if every title had a similar image then the page as a whole may not seem as compelling. Looking at each piece of artwork in isolation may not be enough and we need to think about how to select a diverse set of images across titles on a page and across a session. Beyond the artwork for other titles, the effectiveness of the artwork for a title may depend on what other types of evidence and assets (e.g. synopses, trailers, etc.) we also display for that title. Thus, we may need a diverse selection where each can highlight complementary aspects of a title that may be compelling to a member.


To achieve effective personalization, we also need a good pool of artwork for each title. This means that we need several assets where each is engaging, informative and representative of a title to avoid “clickbait”. The set of images for a title also needs to be diverse enough to cover a wide potential audience interested in different aspects of the content. After all, how engaging and informative a piece of artwork is truly depends on the individual seeing it. Therefore, we need to have artwork that highlights not only different themes in a title but also different aesthetics. Our teams of artists and designers strive to create images that are diverse across many dimensions. They also take into consideration the personalization algorithms which will select the images during their creative process for generating artwork.


Finally, there are engineering challenges to personalize artwork at scale. One challenge is that our member experience is very visual and thus contains a lot of imagery. So using personalized selection for each asset means handling a peak of over 20 million requests per second with low latency. Such a system must be robust: failing to properly render the artwork in our UI brings a significantly degrades the experience. Our personalization algorithm also needs to respond quickly when a title launches, which means rapidly learning to personalize in a cold-start situation. Then, after launch, the algorithm must continuously adapt as the effectiveness of artwork may change over time as both the title evolves through its life cycle and member tastes evolve.

最后,是大规模个性化封面面临的工程挑战。由于我们的会员体验是视觉化的,包含大量的图像,因此,系统在峰值时需要每秒处理超过 2000 万个低延迟请求。这个系统必须足够强大,因为用户界面不能正确渲染图稿,用户体验会显著下降。而且,个性化算法还需要在视频上传时做出快速响应,这意味着要在冷启动的情况下快速个性化学习。启动后,该算法必须不断进行调试,因为封面的效果可能会随着时间的推移而变化,视频的生命周期不断演变,而且会员的品味也在不断变化。

Contextual bandits approach

Much of the Netflix recommendation engine is powered by machine learning algorithms. Traditionally, we collect a batch of data on how our members use the service. Then we run a new machine learning algorithm on this batch of data. Next we test this new algorithm against the current production system through an A/B test. An A/B test helps us see if the new algorithm is better than our current production system by trying it out on a random subset of members. Members in group A get the current production experience while members in group B get the new algorithm. If members in group B have higher engagement with Netflix, then we roll-out the new algorithm to the entire member population. Unfortunately, this batch approach incurs regret: many members over a long period of time did not benefit from the better experience. This is illustrated in the figure below.

Netflix 的大部分推荐引擎都采用机器学习算法。首先,我们会收集一批关于会员如何使用服务的数据,然后在这批数据上运行一个新的机器学习算法。接下来,我们对这种算法在现有生产系统上进行 A / B 测试。通过在随机子集上进行 A / B 测试,我们了解到新算法是否比现有的生产系统更好。A 组会员代表当前的产品体验,而 B 组代表新算法下的产品体验。如果 B 组中的会员对 Netflix 的参与度更高,那么我们将把这个新算法推广到整个会员群体。不幸的是,这种批处理方式也有缺憾(regret):许多会员长期以来并没有更好的用户体验,如下图所示:



To reduce this regret, we move away from batch machine learning and consider online machine learning. For artwork personalization, the specific online learning framework we use is contextual bandits. Rather than waiting to collect a full batch of data, waiting to learn a model, and then waiting for an A/B test to conclude, contextual bandits rapidly figure out the optimal personalized artwork selection for a title for each member and context. Briefly, contextual bandits are a class of online learning algorithms that trade off the cost of gathering training data required for learning an unbiased model on an ongoing basis with the benefits of applying the learned model to each member context. In our previous unpersonalized image selection work, we used non-contextual bandits where we found the winning image regardless of the context. For personalization, the member is the context as we expect different members to respond differently to the images.

为了减小这个缺憾,我们放弃了批处理机器学习,而使用在线机器学习。对于图片个性化,我们使用的在线学习框架是contextual bandits。contextual bandits 并不是收集整批的数据,进行学习模型训练,直到 A / B 测试结束,而是可以迅速为每个会员找到最合适的个性化图片。简而言之,contextual bandits是一类在线学习算法,这种算法可以在学习无偏差模型所需的训练数据成本,和将学习模型应用于每个会员的好处之间进行权衡。在之前的工作中,我们使用非contextual bandits 方法进行封面选择,找到内容上最佳的图像。而对于个性化推荐,我们要考虑上下文,因为我们预计不同的会员会对图像做出不同的反应。

A key property of contextual bandits is that they are designed to minimize regret. At a high level, the training data for a contextual bandit is obtained through the injection of controlled randomization in the learned model’s predictions. The randomization schemes can vary in complexity from simple epsilon-greedy formulations with uniform randomness to closed loop schemes that adaptively vary the degree of randomization as a function of model uncertainty. We broadly refer to this process as data exploration. The number of candidate artworks that are available for a title along with the size of the overall population for which the system will be deployed informs the choice of the data exploration strategy. With such exploration, we need to log information about the randomization for each artwork selection. This logging allows us to correct for skewed selection propensities and thereby perform offline model evaluation in an unbiased fashion, as described later.

contextual bandits 的一个重要属性,是其是为尽量减小缺憾而设计的。在高层次上,我们通过在学习模型的预测中输入受控随机化来获得contextual bandits 的训练数据。随机化方案的复杂性可以从简单的具有均匀随机性的 epsilon-greedy 公式,到随着模型不确定性而自适应地改变随机化程度的闭环方案。我们将这个过程称为数据探索(data exploration)。进行这样的探索,我们需要记录每个封面选择的随机化信息。这种日志记录让我们可以纠正走偏的选择倾向,从而以稍后所述的不偏颇的方式执行离线模型评估。

Exploration in contextual bandits typically has a cost (or regret) due to the fact that our artwork selection in a member session may not use the predicted best image for that session. What impact does this randomization have on the member experience (and consequently on our metrics)? With over a hundred millions members, the regret incurred by exploration is typically very small and is amortized across our large member base with each member implicitly helping provide feedback on artwork for a small portion of the catalog. This makes the cost of exploration per member negligible, which is an important consideration when choosing contextual bandits to drive a key aspect of our member experience. Randomization and exploration with contextual bandits would be less suitable if the cost of exploration were high.

Under our online exploration scheme, we obtain a training dataset that records, for each (member, title, image) tuple, whether that selection resulted in a play of the title or not. Furthermore, we can control the exploration such that artwork selections do not change too often. This gives a cleaner attribution of the member’s engagement to specific artwork. We also carefully determine the label for each observation by looking at the quality of engagement to avoid learning a model that recommends “clickbait” images: ones that entice a member to start playing but ultimately result in low-quality engagement.

由于我们可能不会采用情境 bandits 算法预测的最佳图像,所以数据探索可能会产生成本(或缺憾)。这种随机性对会员体验(以及我们的指标)有什么影响呢?我们有超过一亿的会员,通常情况下,探索带来的缺憾非常小,分摊到庞大的会员基数上,每个会员都会为记录提供一小部分反馈。这使得每个成员的探索成本可以忽略不计,这也是起码选择情境 bandits 改善会员体验的重要因素。如果探索成本很高,那么使用情境 bandits 进行随机化和数据探索就不太合适。根据我们的在线数据探索方案,不管视频是否被播放,我们都会获得一个记录每个(会员、标题、图像)元组的训练数据集。此外,我们可以控制探索,使图像选择不会经常变化,这使得会员对特定图片的参与度更加清晰。

Model training 模型训练

In this online learning setting, we train our contextual bandit model to select the best artwork for each member based on their context. We typically have up to a few dozen candidate artwork images per title. To learn the selection model, we can consider a simplification of the problem by ranking images for a member independently across titles. Even with this simplification we can still learn member image preferences across titles because, for every image candidate, we have some members who were presented with it and engaged with the title and some members who were presented with it and did not engage. These preferences can be modeled to predict for each (member, title, image) tuple, the probability that the member will enjoy a quality engagement. These can be supervised learning models or contextual bandit counterparts with Thompson Sampling, LinUCB, or Bayesian methods that intelligently balance making the best prediction with data exploration.

在在线学习中,我们训练contextual bandits 模型根据情境为每个会员选择最合适的图片。通常每个视频最多有几十张候选图片,为了训练选择模型,我们为每个会员的图片进行排名来简化问题。简化之后,我们仍然可以找到会员对视频图像的偏好,因为呈献给用户的每个候选图像,有一部分会引起用户的参与,而另一部分则不会。我们可以对这些偏好进行建模和预测,会员享受高质量参与度的概率会相应提高。这样的模型可以是监督式学习,也可以是汤普森抽样(Thompson Sampling)contextual bandits、LinUCB 或贝叶斯方法(Bayesian)。

Potential signals 潜在的信息

In contextual bandits, the context is usually represented as an feature vector provided as input to the model. There are many signals we can use as features for this problem. In particular, we can consider many attributes of the member: the titles they’ve played, the genre of the titles, interactions of the member with the specific title, their country, their language preferences, the device that the member is using, the time of day and the day of week. Since our algorithm selects images in conjunction with our personalized recommendation engine, we can also use signals regarding what our various recommendation algorithms think of the title, irrespective of what image is used to represent it.

在contextual bandits 中,contextual 通常表示为模型输入提供的特征向量。我们可以使用许多信息作为特征,尤其是会员的许多属性:他们播放的视频、视频类型、会员对特定视频的参与度、国籍、语言偏好、使用设备、时间等。

An important consideration is that some images are naturally better than others in the candidate pool. We observe the overall take rates for all the images in our data exploration, which is simply the number of quality plays divided by the number of impressions. Our previous work on unpersonalized artwork selection used overall differences in take rates to determine the single best image to select for a whole population. In our new contextual personalized model, the overall take rates are still important and personalization still recovers selections that agree on average with the unpersonalized model’s ranking.

另外一个重要的考虑因素,是候选池中一些图片优于其他图片。我们观察数据探索中所有图像的总体转换率(take rates),即高质量播放次数除以印象数量。以前做非个性化图像选择时,我们仅根据总体转换率之间的差异来决定为用户批量选择的最佳图像。而在我们新的情境 bandits 个性化模型中,整体转换了仍然是重要的,并且个性化推荐仍会与非个性化图像排名有一定重合。

Image Selection 图像选择

The optimal assignment of image artwork to a member is a selection problem to find the best candidate image from a title’s pool of available images. Once the model is trained as above, we use it to rank the images for each context. The model predicts the probability of play for a given image in a given a member context. We sort a candidate set of images by these probabilities and pick the one with the highest probability. That is the image we present to that particular member.


Performance evaluation 效果评估

Offline 离线学习

To evaluate our contextual bandit algorithms prior to deploying them online on real members, we can use an offline technique known as replay [1]. This method allows us to answer counterfactual questions based on the logged exploration data (Figure 1). In other words, we can compare offline what would have happened in historical sessions under different scenarios if we had used different algorithms in an unbiased way.

在线上部署之前,我们可以使用一种称为“重播”的离线技术 [1] 对情境 bandits 算法进行评估。这种方法让我们可以根据记录的探索数据来回答反事实问题(图 1)。换句话说,如果我们在同等条件下使用不同的算法,在不同情境下在线下会发生什么。

Figure 1: Simple example of calculating a replay metric from logged data. For each member, a random image was assigned (top row). The system logged the impression and whether the profile played the title (green circle) or not (red circle). The replay metric for a new model is calculated by matching the profiles where the random assignment and the model assignment are the same (black square) and computing the take fraction over that subset.

(图 1:根据记录的数据计算重播率的简单示例。为每个成员分配一个随机图像(第一行),系统记录了视频印象以及用户播放了视频(绿色圆圈)或没有(红色圆圈)。通过匹配随机分配和模型分配重合的部分(黑色方块),计算该子集的分数来计算新模型的重播指数。)

Replay allows us to see how members would have engaged with our titles if we had hypothetically presented images that were selected through a new algorithm rather than the algorithm used in production. For images, we are interested in several metrics, particularly the take fraction, as described above.

如果我们假设提供的图像是通过新算法选择的,而不是现用的算法,则重播显示出会员对视频的参与度。图 2 显示了与随机选择或非情境 bandits 相比,情境 bandits 如何提高记录中用户的平均参与率。

Figure 2 shows how contextual bandit approach helps increase the average take fraction across the catalog compared to random selection or non-contextual bandits.

(图 2:基于图像探索数据记录中重播率,不同算法选择的图像平均分数(越高越好)。随机(绿色)表示随机选择图像,简单的 Bandit 算法(黄色)选择具有最高分数的图像。情境 bandits 算法(蓝色和粉红色)根据情境为不同的成员选择不同的图像。)

Figure 2: Average image take fraction (the higher the better) for different algorithms based on replay from logged image explore data. The Random (green) policy selects one image at random. The simple Bandit algorithm (yellow) selects the image with highest take fraction. Contextual Bandit algorithms (blue and pink) use context to select different images for different members.

Figure 3: Example of contextual image selection based on the type of profile. Comedy refers to a profile that mostly watches comedy titles. Similarly, Romance watches mostly romantic titles. The contextual bandit selects the image of Robin Williams, a famous comedian, for comedy-inclined profiles while selecting an image of a kissing couple for profiles more inclined towards romance.

(图 3:根据用户个人资料进行的情境图像选择示例。Comedy 指主要观看喜剧片的个人资料,Romance 代表看爱情片最多的用户个人资料。情境 bandits 算法为更喜欢喜剧片的会员推荐了带有著名喜剧演员罗宾·威廉姆斯(Robin Williams)形象,同时更为浪漫的情侣接吻图片。)

Online 在线学习

After experimenting with many different models offline and finding ones that had a substantial increase in replay, we ultimately ran an A/B test to compare the most promising personalized contextual bandits against unpersonalized bandits. As we suspected, the personalization worked and generated a significant lift in our core metrics. We also saw a reasonable correlation between what we measured offline in replay and what we saw online with the models. The online results also produced some interesting insights. For example, the improvement of personalization was larger in cases where the member had no prior interaction with the title. This makes sense because we would expect that the artwork would be more important to someone when a title is less familiar.

经过对多种离线模型进行试验之后,我们找到了可以提高重播率的模型,最后进行 A / B 测试,以对个性化情境 bandits 与非个性化 bandits 进行比较。正如我们所料,个性化对核心指标提高起到了重大的作用。我们也看到了线下测量重播率与线上模型之间的合理性关联。在线结果还发现了有趣的现象,例如,在会员之前没有参与的视频,个性化的改善效果更好。这不无理由,因为我们更希望这个算法对用户并不熟悉的视频发挥更大的作用。

Conclusion 结论

With this approach, we’ve taken our first steps in personalizing the selection of artwork for our recommendations and across our service. This has resulted in a meaningful improvement in how our members discover new content… so we’ve rolled it out to everyone! This project is the first instance of personalizing not just what we recommend but also how we recommend to our members. But there are many opportunities to expand and improve this initial approach. These opportunities include developing algorithms to handle cold-start by personalizing new images and new titles as quickly as possible, for example by using techniques from computer vision. Another opportunity is extending this personalization approach across other types of artwork we use and other evidence that describe our titles such as synopses, metadata, and trailers. There is also an even broader problem: helping artists and designers figure out what new imagery we should add to the set to make a title even more compelling and personalizable.

If these types of challenges interest you, please let us know! We are always looking for great people to join our team, and, for these types of projects, we are especially excited by candidates with machine learning and/or computer vision expertise.



[1] L. Li, W. Chu, J. Langford, and X. Wang, “Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms,” in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 2011, pp. 297–306.