Hierarchical User Profiling for E-commerce Recommender Systems


Yulong Gu


Data Science Lab,
JD.com guyulongcs@gmail.com




Data Science
Lab, JD.com wangshuaiqiang1@jd.com




user profiling that aims to model users' real-time in-terests in different
granularity is an essential issue for personal-ized recommendations in
E-commerce.On one hand, items (i.e. products) are usually organized
hierarchically in categories, and correspondingly users' interests are
naturally hierarchical on dif-ferent granularity of items and categories.On the
other hand, mul-tiple granularity oriented recommendations become very popular
in E-commerce sites, which require hierarchical user profiling in different
granularity as well.In this paper, we propose HUP, a Hierarchical User
Profiling framework to solve the hierarchical user profiling problem in
E-commerce recommender systems.In HUP, we provide a Pyramid Recurrent Neural
Networks, equipped with Behavior-LSTM to formulate users' hierarchical real-time
in-terests at multiple scales.Furthermore, instead of simply utilizing users'
item-level behaviors (e.g., ratings or clicks) in conventional methods, HUP
harvests the sequential information of users' tem-poral finely-granular
interactions (micro-behaviors, e.g., clicks on components of items like
pictures or comments, browses with nav-igation of the search engines or
recommendations) for modeling.Extensive experiments on two real-world
E-commerce datasets demonstrate the significant performance gains of the HUP
against state-of-the-art methods for the hierarchical user profiling and
recommendation problems.We release the codes and datasets at




systems→Personalization;Recommender sys-tems.




User profiling;Recommender
systems;Hierarchical user profiling;Pyramid Recurrent Neural


Permission to
make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page.Copyrights for components of this work
owned by others than ACM must be honored.Abstracting with credit is
permitted.To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee.Request
permissions from permissions@acm.org .WSDM '20, February 3-7, 2020, Houston,
TX, USA 2020 Association for Computing Machinery.ACM ISBN
978-1-4503-6822-3/20/02...$15.00 https://doi.org/10.1145/3336191.3371827

允许免费制作本作品的全部或部分数字或硬拷贝供个人或课堂使用,前提是拷贝的制作或分发不是为了盈利或商业利益,并且拷贝带有本通知和第一页的完整引文。必须尊重除ACM之外的其他人拥有的本作品组件的版权。允许用信用抽象。以其他方式复制或重新发布,在服务器上发布或重新发布到列表,需要事先获得特定许可和/或费用。向permissions@acm.org申请许可。WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿2020计算机械协会。美国计算机学会ISBN 978-1-4503-6822-3/20/02。。。15.00美元https://doi.org/10.1145/3336191.3371827



Figure 1:
Hierarchical recommendations in Amazon In the era of Internet, recommender
systems are playing crucial roles in various applications such as E-commerce
portals (e.g. Ama-zon, JD.com , Alibaba), social networking websites like
Facebook, video-sharing sites like Youtube, visual discovery sites like
Pinterest and so on.In practice, User Profiling [5, 11, 18, 24, 33, 38] is one
of the most important phases in recommender systems.It yields profile vectors,
which formally represent users' interests by deeply understanding their
historical interactions, can be used for candi-date generation [31, 42],
click-through rate prediction [4, 39, 40], conversion rate prediction [3, 16]
and long-term user engagement optimization [34-37, 44-46].


modeling users' hierarchical real-time interests is emerg-ing to be a crucial
issue in E-commerce recommender systems.Firstly, items (i.e. products) in
E-commerce sites are typically orga-nized in hierarchical
catalogue.Correspondingly, users' interests naturally lie hierarchically on
multiple granularity of items and categories.Secondly, different granularity of
recommendations (e.g. item, topic and category) become very popular in E-commerce
sites, and such scenarios require hierarchical user profiling in different


as well.For instance, Figure 1 illustrates a real example of hierarchical
recommendations in Amazon.The left side of the figure recommends some items (mobile
phones) to a user, while the right side shows a list of recommendations on the
categories of "phone accessories", "chargers" and so
on.Category recommen-dation can help the recommender systems quickly figure out
the main interest of the user and make better recommendations.


Existing user
profiling methods mainly focus on item recom-mendations, usually based on
users' item-level responses like rat-ings [20] or clicks [14].Among existing
methods, latent factor mod-eling is a popular branch, including matrix
factorization [13, 20, 38], neural embedding [8, 10], etc.Generally they learn
a unified embed-ding for the target user to represent her interests on the
items based on her historical behaviors.Recently, recurrent neural networks
(RNN) have achieved state-of-the-art performance in session-based
recommendations [14, 29].


methods have the following limitations.First, when facing different granularity
of recommendation tasks, most of them usually need to run a similar algorithm
multiple times on different granularity of item organizations, where each run
builds users' certain level profile vectors for the corresponding
recommendation task, i.e., item-level profiles for item recommendations and
category-level profiles for category recommendations.Correspondingly, the
training process of each level's profile vectors is completely inde-pendent
from the others.However, users' multiple-level interests are closely
correlated.Figure 2 illustrates a user's hierarchical in-terests, including an
item level and two category levels, with her historical behaviors.Resulting
from the correlations between items and categories, improvement on one
recommendation task might benefit others.However, to the best of our knowledge,
such privi-lege has not been explored in existing methods.


Second, only
harvesting the signals of users' item-level inter-actions like ratings and
clicks is insufficient.In most of the E-commerce portals, users provide
finely-granular responses such as clicking and browsing different modules
(e.g., comments and pictures) of items, adding to shopping carts and purchases,
which are referred to as "micro-behaviors" [30, 41].For example, the
bot-tom layer of Figure 2 presents a user's historical micro-behaviors


in JD.com
(one of the largest e-commerce site in the world), includ-ing browsing a pair
of Nike shoes from the homepage, searching and reading specifications of iPhone
8, browsing Google Pixels 2 from the promoting page, searching iPhone X,
reading comments and adding it into the shopping cart for purchasing,
etc.Obviously, in comparison with users' item-level responses, micro-behaviors
provide more detailed information, and preliminary studies [30, 41] have
demonstrated the advantage of modeling such detailed behav-iors.However, to our
best knowledge, none of existing methods has leveraged such advantages to
improve the performance of multiple-level user profiling.

在JD.com(全球最大的电商网站之一),包括从主页浏览一双耐克鞋、搜索阅读iPhone 8的规格、从推广页面浏览Google Pixels 2、搜索iPhone X、阅读评论并添加到购物车进行购买等。显然,与用户的项目级响应相比,微观行为提供了更详细的信息,初步研究[30,41]证明了对这种详细行为建模的优势。然而,据我们所知,现有的方法都没有利用这样的优势来提高多级用户概要分析的性能。

generally users' interests are dynamic and continuously shifting.Some
state-of-the-art methods like Time-LSTM [43] usu-ally incorporate time
intervals to track the interests shifting.How-ever, we argue that besides the
time intervals, the types of behaviors and their dwell time are also extremely
important.As shown in Figure 2, we know that iPhone X is preferable to others,
since vari-ous micro-behaviors are performed on iPhone X with long dwell
time.We also observe that triggered by making an order on iPhone X, the user's
interests on mobile phones drop sharply.Neglecting to model behavior types and
dwell times, Time-LSTM would be in trouble to capture users' detailed
preferences and interests shifting.

第三,一般来说,用户的兴趣是动态的,不断变化的。一些最先进的方法,如时间-LSTM [43]通常结合时间间隔来跟踪利益转移。然而,我们认为,除了时间间隔,行为类型和停留时间也极其重要。如图2所示,我们知道iPhone X比其他的更好,因为各种各样的微行为在长停留时间的iPhone X上执行。我们还观察到,在iPhone X上下单触发,用户对手机的兴趣急剧下降。如果忽略对行为类型和停留时间的建模,时代LSTM将很难捕捉到用户的详细偏好和兴趣转移。

To cope with
these challenges, we present HUP, a hierarchical user profiling framework to
precisely formulate users' real-time interests on multiple organizations of
items, targeting significant performance gains in recommendation accuracy.In
particular, it models users' multiple-level interests with a Pyramid Recurrent
Neural Networks, which typically consist of a micro layer, an item layer, and
multiple category recurrent neural network layers.The micro layer harvests the
detailed behavioral information and passes it to the higher layers, which could
abstract users' hierarchical inter-ests on the corresponding levels of the item
organizations simulta-neously.Furthermore, to sensitively track users'
real-time interests, we introduce Behavior-LSTM in each layer, where a behavior
gate is designed to model the types and dwell time of behaviors.Extensive
experiments for item recommendation and category recommenda-tion tasks have
been conducted on two large-scale real e-commerce datasets to demonstrate the
effectiveness of our proposed approach.






WSDM '20, February
3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿

To sum up,
our major contributions are listed as follows:


• We
formulate a novel hierarchical user profiling problem, which aims to precisely
model users' multiple level interests simultaneously in E-commerce recommender


• We present
HUP, which exploits a Pyramid Recurrent Neu-ral Networks for hierarchical user
profiling based on users' historical micro-behaviors.


• We propose
Behavior-LSTM, which utilizes a behavior gate to model the types and dwell time
of behaviors for effectively formulating users' real-time interests.


• We conduct
extensive experiments and prove that our method outperforms state-of-the-art
baselines greatly on both item recommendation and category recommendation




2.1 User
Profiling for Recommendations


systems [1] can recommend potentially interested items to users for tackling
the information overload problem.Ex-isting works mainly fall into either
content-based technology [26] or collaborative filtering [23].In both of them,
user profiling plays a critical role in formulating users' interests or
characteristics [5] based on their behaviors in the past [18, 24, 33, 35,
38].Classic col-laborative filtering techniques like matrix factorization [20]
learn users' static profiles from their rating preferences for estimation of
users' interests in the future [38].Furthermore, the evolutionary user
profiling can learn users' dynamic profiles along time based on the time
changing factor model [19], vector autoregression [24], dy-namic sparse topic
model [8], etc.However, these methods mainly focus on the item recommendation
problem, where neither the sequential information of users' behaviors nor the
hierarchy of the user profiles could be considered.


2.2 RNN-based
User Profiling


recommender systems, recurrent neural networks (RNN) have shown impressive
advantages by modeling user's sequential behav-iors [14, 15, 17, 29].For
example, Hidasi et al. [14] introduced the concept of session-based
recommendations, and firstly proposed an RNN-based framework to process user's
click sequences on items in a session.Tan et al. [29] further improved its
performance by considering the data augmentation and temporal shift
issues.Hi-dasi et al. [15] integrated some content features extracted from
images and text into parallel RNN architectures, which demon-strated their
significant performance improvements over baselines.Li et al. [22] proposed a
neural attentive recommendation machine that can identify users' main purpose
of their current session tar-geting the performance gains.Beyond behaviors
within a session, Quadrana et al. [27] leveraged an additional GRU layer to
model users' cross-session activities for session-based recommendations.Recently,
it has been found that the temporal information and users' finely-granular
interactions are significantly helpful for recommen-dations.Wu et al. [32]
leveraged timestamps of behaviors with a long short-term memory (LSTM)
autoregressive method.Zhu et al.


[43] proposed
Time-LSTM, which used the time gates to model the time intervals between
behaviors.Wan and McAuley [30] ex-ploited the effectiveness of the relations
among users' different types of behaviors in recommendations.Zhou et al. [41]
trained a


single layer
RNN model with the micro-behaviors for product rec-ommendation.However, this
method only models user's interests in items and just exploits micro behaviors
information as additional input, which might lead to inferior performance.Our
method uses multi-layer Behavior-LSTM cells and attentions to explicitly model
the micro-behaviors information, which can solve both the item recommendation
and the hierarchical categories recommendation problems.


In a word,
most existing RNN-based methods fail to address the hierarchical user profiling
problem.In addition, to the best of our knowledge, there are no explorations
that could leverage the types, dwell time and time intervals of the behaviors
simultaneously in an RNN framework for user profiling.




In this
section, we firstly introduce the background, notations and definitions in this
paper, and then formulate our problem formally.




categories organize products of the E-commerce sites in different
granularity.The hierarchy is generally a tree struc-ture, where each lower
level category is an element of a higher level one, and products are usually
hung onto the finest categories as the leaf nodes of the tree.For example, the
first level category "Electronics" might include some second level
categories like "Tele-phone" and "Accessory", and
"Mobile Phone" is a category in the third and finest level belonging
to "Telephone".


are detailed unit interactions (e.g. reading the detail comments, carting) of
users with recommender systems.They can provide rich information for indicating
users' timely interests, including the type of behavior that a user conducts on
an item, how long a user dwells on an item and move to the next one [30, 41].In
this paper, we consider 10 types of micro behaviors, which are shown in Table


behaviors Description


Browse the product from the homepage ShopList2Product Browse the product from
the category page Sale2Product Browse the product from the sale page Cart2Product
Browse the product from the carted page SearchList2Product Browse the product
from the searched results Detail_comments Read the comments of the product
Detail_specification Read the specification of the product Detail_bottom Read
the bottom of page of the product Cart Add the product to the shopping cart
Order Make an order

主页2产品从主页浏览产品购物列表2产品从类别页面浏览产品销售2产品从销售页面浏览产品Cart 2产品从card页面浏览产品搜索列表2产品从搜索结果Detail_comments浏览产品阅读产品Detail_specification的注释阅读产品Detail_bottom的规格阅读产品Cart的页面底部将产品添加到购物车订单下订单

Table 1: List
of micro-behaviors


Hierarchical User Profiling


3.1 (Hierarchical User Profiling).Hierarchical user pro-filing aims to generate
the micro-level, item-level and hierarchical category-level profile vectors pmu
, piu and pcu = {p () cu , ..., p (K) cu } respectively based on her
micro-behaviors, which represent each target user u's interests in
corresponding granularity.

定义3.1(分层用户分析)。分级用户预归档旨在生成微观级别、项目级别和分级类别级别的配置文件向量pmu、piu和pcu = {p()、cu。。。,p (K) cu }分别基于她的微观行为,以相应的粒度表示每个目标用户u的兴趣。





WSDM '20,
February 3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿

3.2 (Hierarchical Recommendations).Let U be a set of users, V be a set of
items, and C () ,C () , ...,C (K) be the K levels hierarchy of the
categories.The hierarchical recommendations task aims to recommend a set of
items Vˆ u and K set of categories Cˆ () u , ...,Cˆ (K) u to each target user u
by maximizing the relevance between u and her recommendations in different

定义3.2(分级建议)。设U为一组用户,V为一组项目,C(),C(),。。。,C (K)是类别的K级层次结构。分层推荐任务旨在推荐一组项目和一组类别。。。,通过最大化u和她在不同粒度的推荐之间的相关性,将c(K)u分配给每个目标用户u。

4 HUP: A

4 HUP:一个分层的用户配置框架

In this
section, we introduce HUP, a hierarchical user profiling framework for
hierarchical recommendations.As illustrated in Fig-ure 3, HUP utilizes a
Pyramid Recurrent Neural Networks to extract users' hierarchical interests from
micro-behaviors at multiple scales.


4.1 The Input
and Embedding Layers


Given a
target user u, the input of our model is a sequence of her micro-behaviors X = ⟨x1,
x2, ..., xN ⟩.The ith element xi = (ti ,vi ,ci , bi , di , дi) indicates that u
performs a micro-behavior of type bi on the item vi at the time ti , where vi
belongs to multiple-level categories ci = {c () i ,c () i , ...,c (K) i }, the
dwell time is di , and the time interval between xi and xi+1 is дi . Here both
dwell time and time interval are real numbers.As previous work did [41], we
discretize them into several buckets respectively for embed-ding.For each
micro-behavior xi , the embedding layer firstly uses embedding tables of items,
categories, behavior types, dwell time buckets and time intervals to transform
vi ,ci , bi , di , дi into low-dimensional dense vectors (i.e., evi ,eci ,ebi
,edi ,eдi ) respectively and then concatenates these vectors into a single
embedding vector ei .The embedding tables are initialized as random numbers.

给定一个目标用户u,我们模型的输入是她的一系列微观行为X = ⟨x1,x2,。。。⟩. xnith元素xi = (ti,vi,ci,bi,di,дi)表示u在时间ti对项目vi执行bi类型的微行为,其中vi属于多级类别ci = { c(I),c(I),...,c (K) i },停留时间为di,xi到xi+1的时间间隔为дi,这里停留时间和时间间隔都是实数。正如前面的工作所做的[41],我们将它们分别离散成几个桶用于嵌入。对于每个微行为xi,嵌入层首先使用项目、类别、行为类型、停留时间桶和时间间隔的嵌入表将vi、ci、bi、di、дi分别转换成低维密集向量(即evi、eci、ebi、edi、e7i),然后将这些向量串联成单个嵌入向量ei。嵌入表被初始化为随机数。

4.2 Pyramid
Recurrent Neural Networks


Most of
previously recurrent neural networks (RNN)-based recom-mendation methods [5,
14, 15, 29, 41] use a single-layer RNN to generate user profile vectors, which
might not be capable of cap-turing user's hierarchical interests in different
levels.To solve this problem, inspired by the Spatial Pyramid Pooling-net
(SPP-net) [12], we propose a Pyramid Recurrent Neural Networks, which contains
a micro-level, an item-level and several category-level RNN layers to abstract
users' hierarchical interests at multiple scales simulta-neously.

大多数以前基于递归神经网络(RNN)的推荐方法[5,14,15,29,41]使用单层RNN来生成用户简档向量,这可能不能在不同级别上限制用户的分层兴趣。为了解决这个问题,受空间金字塔汇集网(SPP-net) [12]的启发,我们提出了一个金字塔递归神经网络,它包含一个微观层次、一个项目层次和几个类别层次的RNN层,以同时在多个尺度上抽象用户的分层兴趣。

micro-level RNN layer aims to model users' finest level in-terests.The input at
the time stamp i of this layer xMi comes from the embedding layer, and the
output of this layer YM is forwarded to the item-level RNN layer for further
calculations.The hidden state is updated after taking each micro-behavior as
input.The for-mulations of the Micro-level RNN layer are defined in Equation 1.


XM = [xMi ] =
[ei], i = 1, 2, ..., N

XM = [xMi ] =
[ei],i = 1,2,...,N

YM = [yMi ] =
RN N M (XM ), i = 1, 2, ..., N (1)

YM = [yMi ] =
RN N M (XM),i = 1,2,...,N (1)

item-level RNN layer models users' item-level interests.The input at the time
stamp i of this layer xIi is the concatenation of the item embedding evi and
the output of the micro-level layer.The hidden state is only updated after a
user have transferred her focuses from one item to another.Its output YI is
forwarded to the


RNN layers.The formulations of the Item-level RNN layer are defined in Equation


XI = [xIi ] =
[evi ;yMi ]

=[XIi]=[EVI;yMi ]

YI = [yIi ] =
RN NI (XI ) (2)

YI = [yIi ] =
RN NI (XI ) (2)

category-level RNN layers formulate users' category-level interests.In the Kth
category layer (the finest granularity of cat-egories), the input at the time
stamp i is x (K) Ci , which is the con-catenation of the category embedding e
(K) ci and the output of the item-level RNN layer calculated on items under
this category.For other higher-level category layers, the input at the time
stamp i of the kth level category layer is X (k) C , which is the concatenation
of the category embedding e (k) ci in this layer and the output of the (k −
1)th level category layer.In each layer, the hidden state is only updated after
a user has moved her focuses from one category to another in this layer.The
formulations of the category-level RNN layers are defined in Equation 3.

类别级RNN层制定用户的类别级兴趣。在第Kth类别层(最细的类别粒度)中,时间戳I处的输入是x (K) Ci,这是嵌入e (K) ci的类别和对该类别下的项目计算的项目级RNN层的输出的组合。对于其他更高级别的类别层,在第k级类别层的时间戳I处的输入是X (k) C,这是该层中嵌入e (k) ci的类别和第(k-1)级类别层的输出的串联。在每一层中,隐藏状态只有在用户将她的焦点从该层中的一个类别移动到另一个类别后才会更新。类别级RNN层的公式在等式3中定义。

X (k) C = [x
(k) Ci ] = ( [e (k) ci ;yIi ]k = K [e (k) ci ;y (k−) Ci ], k = 1, ...,K − 1 Y
(k) C = [y (k) Ci ] = RN N(k) C X (k) C , k = , ...,K

=[x(k)Ci]=([e(k)Ci;yIi]K = K[e(K)ci;y(k)Ci],k = 1,...,k1 Y(K)C =[Y(K)Ci]= RN N(K)C X(K)C,k =,...,K



Behavior-LSTM Cell


users' interests are dynamic and continuously shifting.Time-LSTM [43] is a
state-of-the-art method that incorporates time intervals between users'
sequential purchases to address the interest shifting problem.However, it
cannot model the behavior type and the dwell time information, which may lead
to inferior performance.We here propose Behavior-LSTM, a novel RNN layer that
provides an additional behavior gate to process the types and dwell time of the
behaviors, enabling HUP to track users' real-time interests more precisely.In
particular, it is described in Figure 4 and formulated in Equation 4:

一般来说,用户的兴趣是动态的,不断变化的。时间-LSTM [43]是一种最先进的方法,它结合了用户连续购买之间的时间间隔,以解决利益转移问题。但是,它无法对行为类型和停留时间信息进行建模,这可能会导致性能下降。我们在这里提出了行为-LSTM,一个新的RNN层,它提供了一个额外的行为门来处理行为的类型和停留时间,使HUP能够更精确地跟踪用户的实时兴趣。具体来说,它在图4中描述,并在等式4中表示:

It = σ
(WI[ht−1, xt ] + bI) Ft = σ (WF[ht−1, xt ] + bF) Tt = σ (WT[xt , ∆t ] + bT) At
= σ (WA[xt , at ] + bA) C˜ t = tanh(WC[ht−1, xt ] + bC) Ct = Ft ⊙ Ct−1 + It ⊙
Tt ⊙ At ⊙ C˜ t Ot = σ (WO[ht−1, xt ] + bO) ht = Ot ⊙ tanh(Ct )

It =σ(WI[ht 1,XT]+bI)Ft
=σ(WF[ht 1,xt ] + bF) Tt = σ (WT[xt,t ] + bT) At = σ (WA[xt,At]+Ba)c≘t = tanh(WC[ht 1,XT]+bC)Ct = Ft⊙Ct 1+It⊙TT⊙At⊙c≘t Ot =σ(WO[ht 1



where I, F,
T, A and O are the input, forget, time, behavior and output gates, C and h are
the cell state and hidden state vectors, WI,WF,WT,WA,WC andWO are weight
matrices, bI, bF, bT, bA, bC and bO are the biases, respectively.The input of
the Behavior-LSTM is a tuple (xt , at , ∆t ), where xt is the embedding vector
of the input at the time stamp t, at is the embedding vector of the behavior
type or dwell time information, and ∆t is the embedding vector of time interval
between current behavior and the next one.


Behavior-LSTM, the time gate T estimates how much informa-tion that should
maintain or pass to the next state, and the behavior gate A calculates the
importance of current behavior with the meta information of the behavior.In
particular, such meta information of the behaviors involves two aspects: their
types and users' dwell time.In particular, the behavior gate actually only
processes the types of micro-behaviors in the micro level RNN layer.It is






WSDM '20,
February 3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿

Figure 3: The
architecture of the HUP.It uses a Pyramid Recurrent Neural Networks, which is
consisted of a micro layer, an item layer, and hierarchical category recurrent
neural networks layers, to extract users' hierarchical profile at multiple
scales.The profiles represent users' real-time interests in items and
hierarchical categories, based on which the most relevant categories and items
can be recommended to users.


Figure 4: The
architecture of the Behavior-LSTM.It has a behavior gate A and a time gate T,
where A models users' behavior information in micro behaviors, and T captures
the time intervals between users' micro behaviors.


most of
micro-behaviors are instant responses and we could not get their dwell time,
but their types are extremely important for users' interest modeling.In the
item-level and hierarchical category-level RNN layers, this gate models the
dwell time on the items or categories.That is because the dwell time varies
significantly in items and categories and is very informative in presenting
users' interests.


4.4 The
Attention Layers


The attention
mechanism [2] is a common technique in deep learn-ing.Usually, it is able to
mitigate long-term dependency issues as well as provide interpretations, which
is extremely important in real-world recommender systems.In particular, an
attention layer


takes the
output sequence Y = [y1,y2, ...,yT ] of an RNN as input and return a context
vector s. Let yi be a user's interests at time stamp i. The context vectors of
each attention layer is calculated as a weighted sum of the interests vectors
among all the time stamps, which is formulated formally in Equation 5.

取输出序列Y = [y1,y2,...假设yi是用户在时间戳I的兴趣。每个关注层的上下文向量被计算为所有时间戳中的兴趣向量的加权和,这在等式5中正式表述。

s = ÕTi=1 α i
y i ;α i = exp ( e i ) Í Tk = 1 exp ( e k ) ;e i = f ( y i y T a i )

s = Ti = 1αI
y I;αI = exp(e I)íTk = 1 exp(e k);e i = f ( y i y T a i)



HUP has
multiple attention layers, where each is directly fol-lowed by its
corresponding RNN layer and therefore referred to as micro, item and category
level attention layers respectively.The context vectors from these attention
layers are denoted as sm, si and sc = {s () c ,s () c , ...,s (K) c }
respectively.The attention signal ai represents the type of micro-behaviors in
micro-level attention layer, and the dwell time in both the item and the
category level attention layers.f is an alignment model, which scores the
impor-tance of yi based on the hidden state yi , last hidden state yT and attention
signal ai .In order to achieve abundant expressive ability, we design the
alignment model f as two-layers feedforward neural networks, which is jointly
trained in the model.

HUP有多个关注层,每个关注层直接跟随其对应的RNN层,因此分别被称为微观、项目和类别级别的关注层。来自这些关注层的上下文向量被表示为sm、si和sc = { s(c)、s(c ),...,s (K) c }分别为。注意信号ai表示微观层次注意层中微观行为的类型,以及在项目和类别层次注意层中的停留时间。f是一个对齐模型,它根据隐藏状态yi、最后一个隐藏状态yT和注意信号ai对yi的重要性进行评分。为了获得丰富的表达能力,我们将对齐模型f设计为两层前馈神经网络,在模型中联合训练。

4.5 The Fully
Connected Layers


The fully
connected neural network layers transform users' con-text vectors from the
attention layers into hierarchical user pro-files.Specifically, they transform
users' micro-level, item-level and category-level context vectors sm, si and sc
= {s () c ,s () c , ...,s (K) c } into real-time user profile vectorspm,pi
andpc = {p () c , p () c , ..., p (K) c } in corresponding levels.

完全连接的神经网络层将用户的文本向量从注意力层转换成分层的用户文件。具体来说,它们转换用户的微观级别、项目级别和类别级别的上下文向量sm、si和sc = { s(c)、s(c ),...,s (K) c }转换为实时用户配置文件向量,pi和c = { p(c),p(c)、...,p (K) c }在相应的级别。



Technical Presentation


WSDM '20,
February 3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿

4.6 Loss


Deep learning
models like convolutional neural networks [21] and recurrent neural networks
[9] usually use softmax as the last layer for prediction.However, in real-world
recommendation scenarios, the possible items can be millions or billions, and
thus such calcu-lations on all items is prohibitively expensive.Given a user u
and her sequential activities Xu , we try to maximize the cosine simi-larity
between the user's real-time profile vectors (i.e. pm, pi and pc = {p () c ,
..., p (K) c }) and the embedding of the ground-truths, on which the target
user will act in the next time stamp N + 1(i.e., the next itemvN +1 for micro
and item level layers or the next hierarchi-cal categories cN +1 = {c () N +1 ,
...,c (K) N +1 } for category-level layers).Similar strategy has achieved
success in recommendation systems

像卷积神经网络[21]和递归神经网络[9]这样的深度学习模型通常使用softmax作为预测的最后一层。然而,在现实世界的推荐场景中,可能的项目可能是数百万或数十亿,因此对所有项目的这种计算是极其昂贵的。给定用户u和她的顺序活动Xu,我们试图最大化用户的实时简档向量(即pm、pi和PC = { p()} c,...p (K) c })和基础事实的嵌入,目标用户将在下一个时间戳N + 1(即,对于微观和项目级层的下一个项目vN +1或下一个分层类别cN+1 = { c(N+1,...,c (K) N +1 }用于类别级层)。类似的策略在推荐系统中取得了成功

[, , ].Let
LMu , LIu and LCu = {L () Cu ...L (K) Cu } be the losses of the micro-level,
item-level and category-level layers for the target user u. The loss of the
micro-level layers LMu can be calculated as Equation 6, where evN +1 is the
embedding of the ground-truth item vN +1.The losses of the item and category
levels can be calculated similarly.

[, , ].让LMu、LIu和LcU = { L()}
Cu。。。L (K) Cu }是目标用户u的微观层、项目层和类别层的损失。微观层LMu的损失可以计算为等式6,其中evN +1是基础事实项目vN +1的嵌入。项目和类别级别的损失可以类似地计算。

LMu =
cosine_proximity(pmevN +) = − pm evN +∥pm ∥ ∥evN +1 ∥ (6) The total loss L is the
weighted sum of losses in micro-level, item-level and category-level layers of
all users.Formally it is defined as follows:

LMu =余弦_邻近度(PME VN+)=
pm evN+∑pm∑evN+1∑(6)总损失L是所有用户的微观层、项目层和类别层损失的加权和。正式定义如下:

L = λlM Õ

L = λlM

u ∈U

u ∈U

LMu + λlI Õ

LMu + λlI

u ∈U

u ∈U

LIu + λlC Õ

LIu + λlC

u ∈U

u ∈U







L (k) Cu (7)


where λlM ,
λlI and λlC are the coefficients of the losses in the micro-level, item-level
and multiple category-level layers respectively.




Hierarchical Recommendations


We evaluate
our proposed HUP method on two tasks: item recom-mendations and category
recommendations.Given a target user u and a sequence of her micro-behaviors,
HUP generates a hierar-chical profile vectors for u, which represent the user's
interests in items and hierarchical categories respectively.At the same time,
the embedding vectors of the items and multiple-level categories can be learned
from HUP as well during the training stage.The item recommendation process is
as follows.At each recommendation stage, as previous work did [41], we first
retrieve a set of candidate items, which are similar to at least one of users'
browsed items in terms of cosine similarity on embeddings.We then calculate the
cosine similarity between each candidate item embedding and the user's
item-level profile vector pi as ranking score.Finally, we rank the candidate
items and select top items for recommendations.The category recommendations are
performed in a similar manner.


5.2 Dataset


To evaluate
the effectiveness of HUP, we utilize the benchmark "JD Micro Behaviors
Datasets" [41], which are collected from a large e-commerce site JD.com
.The datasets contain users' micro-behaviors in two product categories
"Appliances" and "Computers", where each line is a sequence
of a user's micro behaviors in a session.


The statistics
of the datasets are shown in Table 2.In each dataset, we sort all the sessions
in chronological order, and use 70%, 10%, 20% sessions as the training,
validation and testing set respectively.As previous work did [41] , the last
item and the corresponding finest category in each session are used as ground


JD-Applicances JD-Computers Users 6,166,916 3,191,573 Products 169,856 419,388
Categories 103 93 Number of Micro behaviors 176,483,033 88,766,833 Table 2:
Statistics of the Datasets

数据集JD-应用JD-计算机用户6,166,916 3,191,573产品169,856 419,388类别103 93微观行为数量176,483,033 88,766,833表2:数据集统计

5.3 Baseline


We make a
comparative study of our approach HUP with the follow-ing methods, where the
last three are state-of-the-art RNN-based methods that have demonstrated
excellent performance recently.


recommends the most popular items to each user.This simple method is a common
mechanism in recommender sys-tems.This simple method has been proven to be
comparable to some sophisticated recommender algorithms [6].


implements matrix factorization with the Bayesian personal ranking loss.It is
one the most popular methods for recommendations [13, 20, 28].


• Item-KNN is
a popular item-based recommender algorithm that uses similarities between items
for recommendations


particular, the similarity is calculated with sim(i, j) =

[7].特别地,相似度是用sim(i,j) =计算的

F r eq(ij)

F r eq(ij)

F r eq(i)×F r
eq(j) , where Freq(i) is the number of sequences that an item i shows up [7].

F r eq(i)×F r
eq(j),其中F r eq(i)是一个项目I出现的序列数[7]。

• Word2vec
makes recommendations based on embedding of the last item in the sequence [10]
by Word2vec [25].It has been proved to be effective in recommendation [10].


Word2vec-avg makes recommendations based on the aver-age embedding of all items
in the sequence [41].• RIB [41] is a state-of-the-art method that uses RNN and

Word2vec-avg根据序列中所有项目的平均嵌入量提出建议[41]。RIB [41]是一种使用RNN和

mechanism to model user's micro-behaviors for recommendation .


• Time-LSTM
[43] integrates the time interval information between user's item-level
behaviors into LSTM.• S-HRNN [27] utilizes a hierarchical GRUs to model users'
interactions across sessions.

时间-LSTM [43]将用户项目级行为之间的时间间隔信息集成到LSTM。S-HRNN [27]利用一个分层的GRUs来模拟用户跨会话的交互。

Evaluation Metrics


We use two
widely used metrics Recall@K and MRR@K [14, 27, 29, 41] to compare our model
with the baselines.For the item recom-mendation problem, as previous work did
[41], we use Recall@20 and MRR@20 for evaluation.For category recommendation,
we use Recall@5 and MRR@5 instead because user's interests in categories are
relatively stable.We have implemented our framework HUP with Keras 2.2.The
embedding size of items behaviors, categories, dwell time and time intervals
are set to 30, 5, 8, 5 and 5 respectively, the batch size is 128 and the hidden
size of the PRNN layers is 100.

我们使用两个广泛使用的指标Recall@K和MRR@K [14,27,29,41]来比较我们的模型和基线。对于项目推荐问题,和以前的工作一样[41],我们使用Recall@20和MRR@20进行评估。对于类别推荐,我们改用Recall@5和MRR@5,因为用户对类别的兴趣相对稳定。我们已经用Keras 2.2实现了我们的框架HUP。项目行为、类别、停留时间和时间间隔的嵌入大小分别设置为30、5、8、5和5,批量大小为128,PRNN层的隐藏大小为100。



WSDM '20, Presentation February 3-7, 2020, Houston, TX, USA

技术WSDM 20,演示2020年2月3日至7日,美国德克萨斯州休斯顿





Item Rec
Category Rec Item Rec Category Rec


MRR@20 Recall@5 MRR@5 Recall@20 MRR@20 Recall@5 MRR@5

回忆@20 MRR@20回忆@5 MRR@5回忆@20 MRR@20回忆@5 MRR@5

POP 3.1 0.5
45.0 24.0 3.4 1.0 44.0 28.6

持久性有机污染物3.1 0.5 45.0 24.0 3.4 1.0 44.0 28.6

BPR-MF 13.1
3.1 55.4 35.0 11.3 3.0 70.1 42.9

BPR-MF 13.1
3.1 55.4 35.0 11.3 3.0 70.1 42.9

Item-KNN 42.9
9.6 87.0 43.1


Word2vec 38.5
8.8 91.1 90.6 28.4 6.2 84.1 81.6

word 2 vec
38.5 8.8 91.1 90.6 28.4 6.2 84.1 81.6

38.7 13.1 86.7 80.0 24.4 7.1 81.0 71.5

word 2
vec-avg 38.7 13.1 86.7 80.0 24.4 7.1 81.0 71.5

RIB 47.6 14.3
92.9 91.2 28.6 7.6 88.0 83.0

RIB 47.6 14.3
92.9 91.2 28.6 7.6 88.0 83.0

49.4 18.9 93.4 91.3 32.8 10.9 88.7 83.9

时间-LSTM 49.4 18.9 93.4 91.3 32.8 10.9 88.7 83.9



6.8 68.8 32.7

6.8 68.8 32.7

S-HRNN 49.8
19.2 92.6 90.4 33.0 11.0 88.2 82.9

南HRNN 49.8 19.2 92.6 90.4 33.0 11.0 88.2 82.9

HUP 51.520.593.891.635.

HUP 51 . 520
. 593 . 891 . 635 . 012 . 089 . 284 . 4

49.6 19.3 93.1 91.1 32.6 10.6 88.2 83.6

49.6 19.3 93.1 91.1 32.6 10.6 88.2 83.6

19.6 93.2 91.3 32.7 10.7 88.3 83.6

湖北-LSTM 50.1 19.6 93.2 91.3 32.7 10.7 88.3 83.6

50.9 20.0 93.5 91.3 33.8 11.3 88.8 84.0

50.9 20.0 93.5 91.3 33.8 11.3 88.8 84.0

50.3 19.7 93.6 91.5 33.8 11.4 89.1 84.2

50.3 19.7 93.6 91.5 33.8 11.4 89.1 84.2

50.5 19.7 93.4 91.5 33.9 11.4 88.6 83.8

HUP-单身50.5 19.7
93.4 91.5 33.9 11.4 88.6 83.8

Table 3:
Performance of different methods for category recommendation and item
recommendation on two datasets."" indi-cates the statistically
significant improvements (i.e., two-sided t -test with p < 0.01) over both
the best baseline and all variants.

表3:两个数据集上类别推荐和项目推荐不同方法的性能。""表明在最佳基线和所有变量上的统计学显著改善(即,双侧t检验,p < 0.01)。

EXPERIMENTAL RESULTS 6.1 Comparison with Baselines


Table 3 shows
the experimental results of different methods for the item and category
recommendation tasks on the Applicances and Computers datasets.We conducted
significance testing (t-test) on the improvements of our approaches over all
baselines."" denotes strong significant divergence with
p-value<0.01.From the table we can find that:

表3显示了应用程序和计算机数据集上项目和类别推荐任务的不同方法的实验结果。我们对我们的方法在所有基线上的改进进行了显著性测试。""表示p值< 0.01的强显著差异。从表中我们可以发现:

significantly outperforms state-of-the-art methods for the two tasks on both
datasets.Specifically, for item rec-ommendation, HUP outperforms
state-of-the-art method by 3.4%, 6.1% in Recall@20 and 6.7%, 9.1% in MRR@20 for
the "Appliances" and "Computers" datasets respectively.For
cate-gory recommendation, our performance gains are relatively subtle, as this
problem is easier than item recommendation resulting from the denser dataset.


• The POP and
BPR-MF methods perform the worst.• Three RNN-based methods including RIB,
Time-LSTM and


significantly overcome conventional baselines.• By modeling the temporal
information, Time-LSTM achieves better performance than RIB.


Effectiveness of Components in HUP

6.2 HUP中组件的有效性

systematically validate the effectiveness of each component in HUP, we
implement the following variants of HUP, each eliminating a specific model


HUP-NoMicro.This variant does not use micro behaviors for modeling.It only uses
the item-level and category-level RNN layers based on users' interactions with
items and cat-egories.


HUP-LSTM.This variant uses LSTM in the PRNN layers.The time-related mechanism
and type of micro-behaviors are absent in the method.


HUP-TLSTM.This variant uses Time-LSTM [43] in the PRNN layers, where only the
time gates are used in the RNN layer to model the time interval among


HUP-NoAtt.This variant removes the attention layers from HUP.


HUP-Single.This variant solves each recommendation task independently, which
includes a single Behavior-LSTM layer and an attention layer in the framework.


The performance
of different variants are shown in the bottom part of Table 3.From the table,
we can see that the full version of HUP outperforms all of the variants and
find that:


(1) Pyramid
Recurrent Neural Networks.Comparing with the HUP-Single method, the performance
improvement demon-strate the effectiveness of PRNN.


Micro-behaviors.The comparison with HUP-NoMicro demon-strates the importance of
micro-behaviors in HUP.We also notice that HUP-NoMicro obtains the worst
performance in all metrics.


(3) Temporal
mechanisms.Evidenced by the performance loss of HUP-LSTM against HUP-TLSTM, and
HUP, temporal in-formation is necessary in modeling user interests.Furthermore,
sophisticated temporal mechanisms could receive improved perfor-mance.For
example, equipped with time gates, HUP-TLSTM can achieve better performance
than HUP-LSTM;HUP outperforms HUP-TLSTM by further using Behavior gates and
time-mechanisms in the attention layers.


(4) Attention
layers.As demonstrated in our experiments, at-tention layers can significantly
improve the performance of HUP against the HUP-NoAtt variant.Meanwhile,
attention mechanisms also help interpret and visualize the recommendation
results as well.We will show that in the later case study section.


6.3 Trade-off
in Loss Function


As formulated
in Equation (7), the loss function is composed of three components.According to
our experiments, HUP achieves






WSDM '20,
February 3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿

Figure 5: A
case study in an E-commerce Recommender Systems


Figure 6:
Performance of HUP with λlI which is the weight of the losses of item-level RNN


the best
performance when λlM = .Because the micro-level RNN layer is useful for
predicting the next micro-behaviors based on the historical ones, but does not
directly affect the predictions of the items and categories.However, this layer
can be used to inter-pret the real-time effectiveness of the historical
micro-behaviors for each user.Without loss of generality, we set λlC = − λlI
and ≤ λlI ≤ .Therefore we only need to tune λlI in the loss func-tion for a
trade-off between item and category recommendation tasks.Figure 6 shows the
performance of HUP with respect to dif-ferent values of λlI on the
"Applicances" dataset.The curves on the "Computers" dataset
are similar and thus absent from this paper for satisfying the requirement of
the page limit.From the figure we can see that the metrics for item
recommendation decline when λlI is getting lower while the trends are
completely opposite for category

λlM =时的最佳性能。因为微观层面的RNN层对于基于历史行为预测下一个微观行为是有用的,但是不直接影响项目和类别的预测。然而,该层可用于解释每个用户的历史微观行为的实时有效性。不失一般性,我们设置λLc =λLiI且≤λLiI≤。因此,我们只需要在损失函数中调整λlI,以便在项目和类别推荐任务之间进行权衡。图6显示了在“应用程序”数据集上,相对于λlI的不同值,HUP的性能。“计算机”数据集上的曲线是相似的,因此本文中没有满足页数限制要求的曲线。从图中我们可以看出,当λlI变低时,项目推荐的指标下降,而类别的趋势完全相反

balance the performance of the user profiles in different levels, we set λlI
and λlC both to 0.5 in the experiments.


6.4 Case


Figure 5
demonstrates a real case from the "Appliances" dataset to explain how
HUP works.The last 12 micro-behaviors on 7 items from 6 categories are listed
in the figure.The last item is the ground-truth, which spans 2
micro-behaviors.The right side of the figure visualizes the attention weights
of these micro-behaviors from our proposed HUP, HUP-TLSTM (a variant of HUP)
and RIB (a state-of-the-art baseline).From the figure we can see that:


(1) The
attention weights of the micro behaviors "Cart" and
"Search2Product" are higher than others for all methods, which means
these two micro behaviors are important for modeling user interests.


(2) The time
interval between the browsing behaviors on the item "Bear Electric
Kettle" and the next one is 36 seconds.As the attention weights shown,
HUP-TLSTM and HUP pay much less attentions to the first 4 items than RIB
resulting from the time gates.It illustrates their ability of forgetting
history behaviors happened long time ago by modeling the time interval


(3) Time
interval between the browsing behaviors on "Yogurt Maker and Ice Cream
Machine" and "Bear Egg Cooker" is merely 2 seconds.This number
between "Bear Egg Cooker" and the next item (ground-truth) is also 2
seconds.Both are very short.HUP-TLSTM retains such history information and
still pays much attention to these two items resulting from the short time
interval.However, HUP can notice the fact that the user has already added these
two items to cart.It thus reduces the importance on these two items and their
categories, and then chooses an item from a related category (Yogurt Maker and
Toaster Bundle) for return.






WSDM '20,
February 3-7, 2020, Houston, TX, USA

WSDM '20,2020年2月3日至7日,美国德克萨斯州休斯顿



In this
paper, we investigate the hierarchical user profiling problem, aiming to model
users' real-time interests in different granularity.It is crucial for
multiple-level recommendation tasks, such as item, category, topic, theme
recommendations and so on.We hence pro-pose HUP, a hierarchical user profiling
framework, which leverages a Pyramid Recurrent Neural Networks to abstract
users' interests in different granularity simultaneously from users'
micro-behaviors.To better model users' real-time interests, we design
Behavior-LSTM cells to integrate the meta information of behaviors (e.g. the
type, dwell time and time interval information) into HUP.Exten-sive experiments
on two real-world E-commerce datasets verify the effectiveness of our method
for both item and category recom-mendation tasks.


from its effectiveness and flexibility, our framework can be widely used to
recommend items (e.g. movies, music, news) and corresponding categories (e.g.
science fiction films, rock music, breaking news) in various web services (e.g.
videos or music sharing sites, social networks).