10
Sequential Scenario-Specific Meta Learner for Online Recommendation Zhengxiao Du 1 , Xiaowei Wang 2 , Hongxia Yang 2 , Jingren Zhou 2 , Jie Tang 1 1 Department of Computer Science and Technology, Tsinghua University 2 DAMO Academy, Alibaba Group [email protected],{daemon.wxw,yang.yhx,jingren.zhou}@alibaba-inc.com,[email protected] ABSTRACT Cold-start problems are long-standing challenges for practical rec- ommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenar- ios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples re- lies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to- end framework, namely Scenario-specific Sequential Meta learner (or s 2 Meta ). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation 1 . Deployment is at the Guess You Like session, the front page of the Mobile Taobao; and the illustration video can also be watched from the link 2 . CCS CONCEPTS Information systems Recommender systems; Comput- ing methodologies Neural networks. KEYWORDS Recommender systems, Personalized ranking, Neural network, Meta learning, Few-shot learning ACM Reference Format: Zhengxiao Du 1 , Xiaowei Wang 2 , Hongxia Yang 2 , Jingren Zhou 2 , Jie Tang 1 . 2019. Sequential Scenario-Specific Meta Learner for Online Recommen- dation. In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’19), August 4–8, 2019, Anchorage, AK, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3292500.3330726 1 The source code is available at https://github.com/THUDM/ScenarioMeta 2 https://youtu.be/TNHLZqWnQwc Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. KDD’19, August 4–8, 2019, Anchorage, AK, USA © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6201-6/19/08. . . $15.00 https://doi.org/10.1145/3292500.3330726 1 INTRODUCTION The personalized recommendation is an important method for infor- mation retrieval and content discovery in today’s information-rich environment. Personalized recommender systems, where the rec- ommendation is generated according to users’ past behaviors or pro- files, have been proven effective in domains including E-Commerce [29], social networking services [27], video-sharing websites [10], among many others. Traditionally, a personalized recommender system can be seen as a mapping U×I→ R, where U is the user set and I is the item set. The mapping result can be a real value for explicit ratings or a binary value for implicit feedback [18, 26, 26, 47]. This setting usually assumes that the behavior pattern of the same user is relatively stationary in different contexts, which is not true in many practical tasks [31, 36]. For example, during the Singles Day Promotion (Double 11) period, the largest online shopping festival in China, consumers sometimes shop impulsively allured by the low discounts. In such scenarios, the contextual information of Double 11 is quite critical. It also has been shown that including contextual information leads to better predictive models and better quality of recommendations [1, 35]. Though context-aware recommender systems have been proven effective [3], they are facing several challenges. Firstly, a large portion of scenarios in a system is actually long-tailed, without enough user feedback. Moreover, the life cycle of a scenario can be quite short. In Taobao, most promotions end within a few days or even a few hours after being launched, without enough time to collect sufficient user feedback for training. Therefore, train- ing scenario-specific recommenders with limited observations are practical requirements. Most existing recommendation algorithms rely on extensive observed data and are brittle to new products and/or consumers [51, 53]. Although the cold-start problem can be tackled by cross-domain recommender systems with domain knowledge being transferred, they still require a large amount of shared samples across domains [21, 32, 43]. Secondly, when train- ing a predictive model on a new scenario, the hyperparameters often have a great influence on the performance and optimal hy- perparameters in different scenarios may differ significantly [5, 50]. Finding a right combination of hyperparameters usually requires great human efforts along with sufficient observations. This paper addresses the problem of cold-start scenario recom- mendation with the recent progress on few-shot learning [44, 52, 55] and meta-learning [4, 14, 34, 39]. Our approach is to build a meta learner that learns how to instantiate a recommender with good generalization capacity from limited training data. The framework, which generates a scenario-specific recommender by a sequen- tial learning process, is called Scenario-specific Sequential Meta learner (or s 2 Meta ). The sequential process, which resembles the arXiv:1906.00391v1 [cs.IR] 2 Jun 2019

Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

Sequential Scenario-Specific Meta Learner forOnline Recommendation

Zhengxiao Du1, Xiaowei Wang2, Hongxia Yang2, Jingren Zhou2, Jie Tang11 Department of Computer Science and Technology, Tsinghua University

2 DAMO Academy, Alibaba [email protected],{daemon.wxw,yang.yhx,jingren.zhou}@alibaba-inc.com,[email protected]

ABSTRACTCold-start problems are long-standing challenges for practical rec-ommendations. Most existing recommendation algorithms rely onextensive observed data and are brittle to recommendation scenar-ios with few interactions. This paper addresses such problems usingfew-shot learning and meta learning. Our approach is based on theinsight that having a good generalization from a few examples re-lies on both a generic model initialization and an effective strategyfor adapting this model to newly arising tasks. To accomplish this,we combine the scenario-specific learning with a model-agnosticsequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner(or s2Meta ). By doing so, ourmeta-learner produces a generic initialmodel through aggregating contextual information from a varietyof prediction tasks while effectively adapting to specific tasks byleveraging learning-to-learn knowledge. Extensive experiments onvarious real-world datasets demonstrate that our proposed modelcan achieve significant gains over the state-of-the-arts for cold-startproblems in online recommendation1. Deployment is at the GuessYou Like session, the front page of the Mobile Taobao; and theillustration video can also be watched from the link2.

CCS CONCEPTS• Information systems→Recommender systems; •Comput-ing methodologies → Neural networks.

KEYWORDSRecommender systems, Personalized ranking, Neural network,Meta learning, Few-shot learning

ACM Reference Format:Zhengxiao Du1, Xiaowei Wang2, Hongxia Yang2, Jingren Zhou2, Jie Tang1. 2019. Sequential Scenario-Specific Meta Learner for Online Recommen-dation. In The 25th ACM SIGKDD Conference on Knowledge Discovery andData Mining (KDD’19), August 4–8, 2019, Anchorage, AK, USA. ACM, NewYork, NY, USA, 10 pages. https://doi.org/10.1145/3292500.3330726

1The source code is available at https://github.com/THUDM/ScenarioMeta2https://youtu.be/TNHLZqWnQwc

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected]’19, August 4–8, 2019, Anchorage, AK, USA© 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6201-6/19/08. . . $15.00https://doi.org/10.1145/3292500.3330726

1 INTRODUCTIONThe personalized recommendation is an important method for infor-mation retrieval and content discovery in today’s information-richenvironment. Personalized recommender systems, where the rec-ommendation is generated according to users’ past behaviors or pro-files, have been proven effective in domains including E-Commerce[29], social networking services [27], video-sharing websites [10],among many others. Traditionally, a personalized recommendersystem can be seen as a mappingU ×I → R, whereU is the userset and I is the item set. The mapping result can be a real value forexplicit ratings or a binary value for implicit feedback [18, 26, 26, 47].This setting usually assumes that the behavior pattern of the sameuser is relatively stationary in different contexts, which is not truein many practical tasks [31, 36]. For example, during the SinglesDay Promotion (Double 11) period, the largest online shoppingfestival in China, consumers sometimes shop impulsively alluredby the low discounts. In such scenarios, the contextual informationof Double 11 is quite critical. It also has been shown that includingcontextual information leads to better predictive models and betterquality of recommendations [1, 35].

Though context-aware recommender systems have been proveneffective [3], they are facing several challenges. Firstly, a largeportion of scenarios in a system is actually long-tailed, withoutenough user feedback. Moreover, the life cycle of a scenario canbe quite short. In Taobao, most promotions end within a few daysor even a few hours after being launched, without enough timeto collect sufficient user feedback for training. Therefore, train-ing scenario-specific recommenders with limited observations arepractical requirements. Most existing recommendation algorithmsrely on extensive observed data and are brittle to new productsand/or consumers [51, 53]. Although the cold-start problem canbe tackled by cross-domain recommender systems with domainknowledge being transferred, they still require a large amount ofshared samples across domains [21, 32, 43]. Secondly, when train-ing a predictive model on a new scenario, the hyperparametersoften have a great influence on the performance and optimal hy-perparameters in different scenarios may differ significantly [5, 50].Finding a right combination of hyperparameters usually requiresgreat human efforts along with sufficient observations.

This paper addresses the problem of cold-start scenario recom-mendation with the recent progress on few-shot learning [44, 52, 55]and meta-learning [4, 14, 34, 39]. Our approach is to build a metalearner that learns how to instantiate a recommender with goodgeneralization capacity from limited training data. The framework,which generates a scenario-specific recommender by a sequen-tial learning process, is called Scenario-specific Sequential Metalearner (or s2Meta ). The sequential process, which resembles the

arX

iv:1

906.

0039

1v1

[cs

.IR

] 2

Jun

201

9

Page 2: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

traditional machine learning process controlled by human experts,consists of three steps: the meta learner automatically initializes therecommender to be broadly suitable to many scenarios, finetunesits parameters with flexible updating strategy, and stops learningtimely to avoid overfitting. The policy of meta learner is learnedon a variety of existing scenarios in the system and can guide thequick acquisition of knowledge in new scenarios. By automatingthe learning process in each scenario, we can also reduce the humanefforts needed to train the predictive model on new scenarios thatcontinuously appear in the recommender systems.Organization. The remainder of this paper is organized as follows:in Section 2, we review related works. Section 3 introduces theproblem definition and meta learning settings. Section 4 gives adetailed description of the proposed s2Meta framework. Experimen-tal results on large-scale public datasets and the newly releasedTaobao theme recommendation dataset are shown in Section 5. Atlast, Section 6 concludes the paper.

2 RELATEDWORKIn this section, we go over the related works on context-aware rec-ommendation, cross-domain recommendation, and meta learningrespectively.

2.1 Context-Aware RecommendationThe importance of contextual information has been recognizedby researchers and practitioners in many disciplines, including E-Commerce personalization [35], information retrieval [23], datamining [6] and marketing [36], among many others [31, 46]. Rele-vant contextual information does matter in recommender systems,and it is important to take account of contextual information, suchas time, location, or acquaintances’ impacts [1, 2, 35]. Compared totraditional recommender systems that make predictions based onthe information of users and items, context-aware recommendersystems [1, 3] make predictions in the space ofU×I×C, where Cis the context space. The contextual information can be observable(e.g., time, location, etc), or unobservable (e.g., users’ intention).The latter case is related to Session-based Recommendation [19]or Sequence-aware Recommendation [38]. Even if the contextualinformation is fully-observable, the weight of different types ofinformation is entirely domain-dependent and quite tricky to betuned for cold-start scenarios.

2.2 Cross-Domain RecommendationTraditional recommender systems suggest items belonging to asingle domain and this is not perceived as a limitation, but as afocus on a particular market [10, 26, 27]. However, nowadays, usersprovide feedback for items of different types, express their opin-ions on different social media and different providers. Providersalso wish to cross-sell various categories of products and services,especially to new users. Unlike traditional recommender systemsthat work on homogeneous users and items, cross-domain recom-mender systems [8, 13], closely related to Transfer Learning [49],try to combine the information from heterogeneous users and/oritems. In [30, 43], they extend the traditional matrix factorization[26] or factorization machines [40] with interacted informationfrom an auxiliary domain to inform recommendations in a target

domain. In [32], a multi-layer perceptron is used to capture thenonlinear mapping function across different domains. In [21], theypropose a novel neural network CoNet, with architecture designedfor knowledge transfer across domains. However, all these methodsrequire a large amount of interacted data of the shared users oritems between the source and target domains. In [56], they proposeto align the different features in two domains without shared usersor items with semantic relatedness. However, they assume that therelatedness of features in the source and target domains can beinferred from domain prior knowledge.

2.3 Meta LearningMeta learning [7], or Learning to learn, which aims to learn a learnerthat can efficiently complete different learning tasks, has gainedgreat success in few-shot learning settings [14, 39, 44, 52]. Theinformation learned across all tasks guides the quick acquisitionof knowledge within each separate learning task. Previous workon meta learning can be divided into two groups. One is the metricmethod that learns a similarity metric between new instances andinstances in the training set. Examples include Siamese Network[25], Matching Network [52], and Prototypical Network [44]. Theother is the parameter method that directly predicts or updates theparameters of the classifier according to the training data. Examplesinclude MAML [14], Meta-LSTM [39] and Meta Network [33]. Mostmethods follow the framework in [52], trying to solve a group oflearning tasks of the same pattern, e.g., learning to recognize theobjects of different categories with only one example per categorygiven [28]. Previous works on the application of learning to learnin recommender systems mainly focus on the algorithm selection[11–13], leaving other parts of the learning process less explored. In[51], they propose to use meta learning to solve the user cold-startproblem in the Tweet recommendation. However, they use a neuralnetwork to directly output parameters of the recommender, whichlimits the representation ability of their model.

3 PRELIMINARYIn this section, we formally define the problem of scenario-awarerecommendation and introduce the meta learning settings. Wesummarize the notations in Table 1.

3.1 Problem FormulationA scenario-based recommender system is represented as {U,I,C},whereU is the user set, I is the item set and C is the scenario set.A scenario c ∈ C is defined as a common context in which usersinteract with items and (u, i) ∈ Hc represents that useru interactedwith item i in the scenario c . Each scenario c is connected witha recommender function fc : U × I → R, where fc (u, i) is theranking score of item i for user u in scenario c . For each scenarioc , we have access to a training set Dtrain

c = {(uk , ik )Nck=1} where

(uk , ik ) ∈ Hc . Our task is to recommend top-n items for each useru in scenario c and maximize the probability of the user’s follow-up behaviors, e.g., clicks or conversions. Notice that, compared toprevious works in cross-domain recommender systems that requirea large amount of training samples on each domain, our workstudies the case where the size ofDtrain

c is limited. The latter usually

Page 3: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

Meta Learner

ŏ

Train Test

Initializer

Stop?

!S<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

!u<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

!R<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

NY

ŏ

ŏ

EmbeddingModule

Hidden Layer Module

OutputModule

Stop Controller

LSTM

ŏ

ŏ

ŏ

Update Controller

LSTM

EmbeddingModule

Hidden Layer Module

OutputModule

Travel

Party

Baby

ŏf(0)travel

<latexit sha1_base64="B25bFCmEBHbvtgGaK2XDkSS042E=">AAACAHicbVA9SwNBEN3zM8avUwsLm8MgxCbcqaBl0MYygvmA5Ax7m7lkyd4Hu3PBcFzjX7GxUMTWn2Hnv3GTXKGJDwYe780wM8+LBVdo29/G0vLK6tp6YaO4ubW9s2vu7TdUlEgGdRaJSLY8qkDwEOrIUUArlkADT0DTG95M/OYIpOJReI/jGNyA9kPuc0ZRS13z0H9Iy/Zp1k07CI+YoqQjEFnWNUt2xZ7CWiROTkokR61rfnV6EUsCCJEJqlTbsWN0UyqRMwFZsZMoiCkb0j60NQ1pAMpNpw9k1olWepYfSV0hWlP190RKA6XGgac7A4oDNe9NxP+8doL+lZvyME4QQjZb5CfCwsiapGH1uASGYqwJZZLrWy02oJIy1JkVdQjO/MuLpHFWcc4r9t1FqXqdx1EgR+SYlIlDLkmV3JIaqRNGMvJMXsmb8WS8GO/Gx6x1ychnDsgfGJ8/M/qWzg==</latexit>

f(0)party

<latexit sha1_base64="2VCB6o1YJWgsczt4ucAu+iGDUSE=">AAAB/3icbVDLSsNAFJ34rPUVFdy4CRahbkqigi6LblxWsA9oY5hMJ+3QySTM3IghZuGvuHGhiFt/w51/47TNQlsPXDicc+/MvcePOVNg29/GwuLS8spqaa28vrG5tW3u7LZUlEhCmyTikez4WFHOBG0CA047saQ49Dlt+6Orsd++p1KxSNxCGlM3xAPBAkYwaMkz94O7rGof517WA/oAWYwlpHnumRW7Zk9gzROnIBVUoOGZX71+RJKQCiAcK9V17BjcTL/GCKd5uZcoGmMywgPa1VTgkCo3m+yfW0da6VtBJHUJsCbq74kMh0qloa87QwxDNeuNxf+8bgLBhZsxESdABZl+FCTcgsgah2H1maQEeKoJJpLpXS0yxBIT0JGVdQjO7MnzpHVSc05r9s1ZpX5ZxFFCB+gQVZGDzlEdXaMGaiKCHtEzekVvxpPxYrwbH9PWBaOY2UN/YHz+AHeSlmY=</latexit>

f(0)baby

<latexit sha1_base64="BX8jaCBY1nNlcONqXtkhDzG5nm4=">AAAB/nicbVDLSsNAFJ3UV62vqLhyEyxC3ZREBV0W3bisYB/QxjCZTtqhkwczN2IYAv6KGxeKuPU73Pk3TtsstPXAhcM593LvPX7CmQTb/jZKS8srq2vl9crG5tb2jrm715ZxKghtkZjHoutjSTmLaAsYcNpNBMWhz2nHH19P/M4DFZLF0R1kCXVDPIxYwAgGLXnmQXCvavZJ7qk+0EdQPvazPPfMql23p7AWiVOQKirQ9Myv/iAmaUgjIBxL2XPsBFyFBTDCaV7pp5ImmIzxkPY0jXBIpaum5+fWsVYGVhALXRFYU/X3hMKhlFno684Qw0jOexPxP6+XQnDpKhYlKdCIzBYFKbcgtiZZWAMmKAGeaYKJYPpWi4ywwAR0YhUdgjP/8iJpn9ads7p9e15tXBVxlNEhOkI15KAL1EA3qIlaiCCFntErejOejBfj3fiYtZaMYmYf/YHx+QNsSJXK</latexit>

f(t)baby

<latexit sha1_base64="U0/x2QfN5stEW/FaiAHBIzG90nU=">AAAB/nicbVDLSsNAFJ3UV62vqLhyEyxC3ZREBV0W3bisYB/QxjCZTtqhkwczN2IYAv6KGxeKuPU73Pk3TtsstPXAhcM593LvPX7CmQTb/jZKS8srq2vl9crG5tb2jrm715ZxKghtkZjHoutjSTmLaAsYcNpNBMWhz2nHH19P/M4DFZLF0R1kCXVDPIxYwAgGLXnmQXCvanCSe6oP9BGUj/0szz2zatftKaxF4hSkigo0PfOrP4hJGtIICMdS9hw7AVdhAYxwmlf6qaQJJmM8pD1NIxxS6arp+bl1rJWBFcRCVwTWVP09oXAoZRb6ujPEMJLz3kT8z+ulEFy6ikVJCjQis0VByi2IrUkW1oAJSoBnmmAimL7VIiMsMAGdWEWH4My/vEjap3XnrG7fnlcbV0UcZXSIjlANOegCNdANaqIWIkihZ/SK3own48V4Nz5mrSWjmNlHf2B8/gDXmJYO</latexit>

f(t)party

<latexit sha1_base64="eLvDZpFLFTIVQ4+zgiDZg/2KhPc=">AAAB/3icbVDLSsNAFJ34rPUVFdy4CRahbkqigi6LblxWsA9oY5hMJ+3QySTM3IghZuGvuHGhiFt/w51/47TNQlsPXDicc+/MvcePOVNg29/GwuLS8spqaa28vrG5tW3u7LZUlEhCmyTikez4WFHOBG0CA047saQ49Dlt+6Orsd++p1KxSNxCGlM3xAPBAkYwaMkz94O7rArHuZf1gD5AFmMJaZ57ZsWu2RNY88QpSAUVaHjmV68fkSSkAgjHSnUdOwY3068xwmle7iWKxpiM8IB2NRU4pMrNJvvn1pFW+lYQSV0CrIn6eyLDoVJp6OvOEMNQzXpj8T+vm0Bw4WZMxAlQQaYfBQm3ILLGYVh9JikBnmqCiWR6V4sMscQEdGRlHYIze/I8aZ3UnNOafXNWqV8WcZTQATpEVeSgc1RH16iBmoigR/SMXtGb8WS8GO/Gx7R1wShm9tAfGJ8/4yaWqg==</latexit>

f(t�1)party

<latexit sha1_base64="BxjPJ9LPU6Sarl9OBjBxQO/qRuw=">AAACAXicbVDLSsNAFJ3UV62vqBvBTbAIdWFJVNBl0Y3LCvYBbQyT6aQdOpmEmRuxhLjxV9y4UMStf+HOv3HaZqGtBy4czrl35t7jx5wpsO1vo7CwuLS8Ulwtra1vbG6Z2ztNFSWS0AaJeCTbPlaUM0EbwIDTdiwpDn1OW/7wauy37qlULBK3MIqpG+K+YAEjGLTkmXvBXVqBY+co89Iu0AdIYyxhlGWeWbar9gTWPHFyUkY56p751e1FJAmpAMKxUh3HjsFN9WuMcJqVuomiMSZD3KcdTQUOqXLTyQWZdaiVnhVEUpcAa6L+nkhxqNQo9HVniGGgZr2x+J/XSSC4cFMm4gSoINOPgoRbEFnjOKwek5QAH2mCiWR6V4sMsMQEdGglHYIze/I8aZ5UndOqfXNWrl3mcRTRPjpAFeSgc1RD16iOGoigR/SMXtGb8WS8GO/Gx7S1YOQzu+gPjM8fxiqXHA==</latexit>

f(t�1)baby

<latexit sha1_base64="DMxQpRR0hAbL2d/NERZN+rcjCJY=">AAACAHicbVDLSsNAFJ3UV62vqAsXbgaLUBeWRAVdFt24rGAf0MYwmU7aoZMHMzdiCNn4K25cKOLWz3Dn3zhts9DqgQuHc+7l3nu8WHAFlvVllBYWl5ZXyquVtfWNzS1ze6etokRS1qKRiGTXI4oJHrIWcBCsG0tGAk+wjje+mvideyYVj8JbSGPmBGQYcp9TAlpyzT3/LqvBsX2Uu1kf2ANkHvHSPHfNqlW3psB/iV2QKirQdM3P/iCiScBCoIIo1bOtGJyMSOBUsLzSTxSLCR2TIetpGpKAKSebPpDjQ60MsB9JXSHgqfpzIiOBUmng6c6AwEjNexPxP6+XgH/hZDyME2AhnS3yE4EhwpM08IBLRkGkmhAqub4V0xGRhILOrKJDsOdf/kvaJ3X7tG7dnFUbl0UcZbSPDlAN2egcNdA1aqIWoihHT+gFvRqPxrPxZrzPWktGMbOLfsH4+Aa6OpaA</latexit>

ftravel<latexit sha1_base64="NlcqBkFGExbdkUdMPVVFpK+GLYU=">AAAB+nicbVBNS8NAEN34WetXqkcvwSJ4KokKeix68VjBfkAbwmY7aZduPtidVEvMT/HiQRGv/hJv/hu3bQ7a+mDg8d4MM/P8RHCFtv1trKyurW9slrbK2zu7e/tm5aCl4lQyaLJYxLLjUwWCR9BEjgI6iQQa+gLa/uhm6rfHIBWPo3ucJOCGdBDxgDOKWvLMSuBlPYRHzFDSMYg898yqXbNnsJaJU5AqKdDwzK9eP2ZpCBEyQZXqOnaCbkYlciYgL/dSBQllIzqArqYRDUG52ez03DrRSt8KYqkrQmum/p7IaKjUJPR1Z0hxqBa9qfif100xuHIzHiUpQsTmi4JUWBhb0xysPpfAUEw0oUxyfavFhlRShjqtsg7BWXx5mbTOas55zb67qNavizhK5Igck1PikEtSJ7ekQZqEkQfyTF7Jm/FkvBjvxse8dcUoZg7JHxifP2ZblLs=</latexit>

✓(t)<latexit sha1_base64="d8gTgThY+szQdmHDtg7mpg+VgD4=">AAACAnicbVDLSsNAFJ3UV62vqCtxEyxC3ZREBV0W3bisYB/QxDKZTNuhk0mYuRFKCG78FTcuFHHrV7jzb5y0WWjrgWEO59zLvff4MWcKbPvbKC0tr6yuldcrG5tb2zvm7l5bRYkktEUiHsmujxXlTNAWMOC0G0uKQ5/Tjj++zv3OA5WKReIOJjH1QjwUbMAIBi31zQPXj3igJqH+UhdGFHB2n9bgJOubVbtuT2EtEqcgVVSg2Te/3CAiSUgFEI6V6jl2DF6KJTDCaVZxE0VjTMZ4SHuaChxS5aXTEzLrWCuBNYikfgKsqfq7I8WhyrfUlSGGkZr3cvE/r5fA4NJLmYgToILMBg0SbkFk5XlYAZOUAJ9ogolkeleLjLDEBHRqFR2CM3/yImmf1p2zun17Xm1cFXGU0SE6QjXkoAvUQDeoiVqIoEf0jF7Rm/FkvBjvxsestGQUPfvoD4zPH+sQl78=</latexit>

✓(t)<latexit sha1_base64="d8gTgThY+szQdmHDtg7mpg+VgD4=">AAACAnicbVDLSsNAFJ3UV62vqCtxEyxC3ZREBV0W3bisYB/QxDKZTNuhk0mYuRFKCG78FTcuFHHrV7jzb5y0WWjrgWEO59zLvff4MWcKbPvbKC0tr6yuldcrG5tb2zvm7l5bRYkktEUiHsmujxXlTNAWMOC0G0uKQ5/Tjj++zv3OA5WKReIOJjH1QjwUbMAIBi31zQPXj3igJqH+UhdGFHB2n9bgJOubVbtuT2EtEqcgVVSg2Te/3CAiSUgFEI6V6jl2DF6KJTDCaVZxE0VjTMZ4SHuaChxS5aXTEzLrWCuBNYikfgKsqfq7I8WhyrfUlSGGkZr3cvE/r5fA4NJLmYgToILMBg0SbkFk5XlYAZOUAJ9ogolkeleLjLDEBHRqFR2CM3/yImmf1p2zun17Xm1cFXGU0SE6QjXkoAvUQDeoiVqIoEf0jF7Rm/FkvBjvxsestGQUPfvoD4zPH+sQl78=</latexit>

fparty<latexit sha1_base64="BBk/+enbWTZUETr5sKrv/dXLBl8=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRIVdFl047KCfUAbwmQ6aYdOJmHmphhC/sSNC0Xc+ifu/BunbRbaeuDC4Zx7Z+49QSK4Bsf5tipr6xubW9Xt2s7u3v6BfXjU0XGqKGvTWMSqFxDNBJesDRwE6yWKkSgQrBtM7mZ+d8qU5rF8hCxhXkRGkoecEjCSb9uhnw+APUGeEAVZUfh23Wk4c+BV4pakjkq0fPtrMIxpGjEJVBCt+66TgJeb1zgVrKgNUs0SQidkxPqGShIx7eXzzQt8ZpQhDmNlSgKeq78nchJpnUWB6YwIjPWyNxP/8/ophDdezmWSApN08VGYCgwxnsWAh1wxCiIzhFDFza6YjokiFExYNROCu3zyKulcNNzLhvNwVW/elnFU0Qk6RefIRdeoie5RC7URRVP0jF7Rm5VbL9a79bForVjlzDH6A+vzB6vWlFM=</latexit>

fbaby<latexit sha1_base64="vrvVE3YgnASLVD1pziyO1bmQXkc=">AAAB+HicbVBNS8NAEN34WetHox69LBbBU0lU0GPRi8cK9gPaEDbbTbt0swm7EzGG/BIvHhTx6k/x5r9x2+agrQ8GHu/NMDMvSATX4Djf1srq2vrGZmWrur2zu1ez9w86Ok4VZW0ai1j1AqKZ4JK1gYNgvUQxEgWCdYPJzdTvPjCleSzvIUuYF5GR5CGnBIzk27XQzwfAHiEPSJAVhW/XnYYzA14mbknqqETLt78Gw5imEZNABdG67zoJeDlRwKlgRXWQapYQOiEj1jdUkohpL58dXuATowxxGCtTEvBM/T2Rk0jrLApMZ0RgrBe9qfif108hvPJyLpMUmKTzRWEqMMR4mgIecsUoiMwQQhU3t2I6JopQMFlVTQju4svLpHPWcM8bzt1FvXldxlFBR+gYnSIXXaImukUt1EYUpegZvaI368l6sd6tj3nrilXOHKI/sD5/AKJvk7c=</latexit>

✓(0)<latexit sha1_base64="cheZNCgbsMsr/H84zWZD7qs59CQ=">AAACAnicbVDLSsNAFJ3UV62vqCtxEyxC3ZREBV0W3bisYB/QxDKZTNuhk0mYuRFKCG78FTcuFHHrV7jzb5y0WWjrgWEO59zLvff4MWcKbPvbKC0tr6yuldcrG5tb2zvm7l5bRYkktEUiHsmujxXlTNAWMOC0G0uKQ5/Tjj++zv3OA5WKReIOJjH1QjwUbMAIBi31zQPXj3igJqH+UhdGFHB2n9bsk6xvVu26PYW1SJyCVFGBZt/8coOIJCEVQDhWqufYMXgplsAIp1nFTRSNMRnjIe1pKnBIlZdOT8isY60E1iCS+gmwpurvjhSHKt9SV4YYRmrey8X/vF4Cg0svZSJOgAoyGzRIuAWRledhBUxSAnyiCSaS6V0tMsISE9CpVXQIzvzJi6R9WnfO6vbtebVxVcRRRofoCNWQgy5QA92gJmohgh7RM3pFb8aT8WK8Gx+z0pJR9OyjPzA+fwCDeJd7</latexit>

p(t)<latexit sha1_base64="TbwxFyNeTu+nMHWnMwFl5PSsaBM=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSLUS0lU0GPRi8cK9gPaWDbbTbt0swm7E6GE/ggvHhTx6u/x5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3gpGt1O/9cS1EbF6wHHC/YgOlAgFo2ilVvKYVfBs0iuV3ao7A1kmXk7KkKPeK311+zFLI66QSWpMx3MT9DOqUTDJJ8VuanhC2YgOeMdSRSNu/Gx27oScWqVPwljbUkhm6u+JjEbGjKPAdkYUh2bRm4r/eZ0Uw2s/EypJkSs2XxSmkmBMpr+TvtCcoRxbQpkW9lbChlRThjahog3BW3x5mTTPq95F1b2/LNdu8jgKcAwnUAEPrqAGd1CHBjAYwTO8wpuTOC/Ou/Mxb11x8pkj+APn8wfxTI9N</latexit>

p(t)<latexit sha1_base64="TbwxFyNeTu+nMHWnMwFl5PSsaBM=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSLUS0lU0GPRi8cK9gPaWDbbTbt0swm7E6GE/ggvHhTx6u/x5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3gpGt1O/9cS1EbF6wHHC/YgOlAgFo2ilVvKYVfBs0iuV3ao7A1kmXk7KkKPeK311+zFLI66QSWpMx3MT9DOqUTDJJ8VuanhC2YgOeMdSRSNu/Gx27oScWqVPwljbUkhm6u+JjEbGjKPAdkYUh2bRm4r/eZ0Uw2s/EypJkSs2XxSmkmBMpr+TvtCcoRxbQpkW9lbChlRThjahog3BW3x5mTTPq95F1b2/LNdu8jgKcAwnUAEPrqAGd1CHBjAYwTO8wpuTOC/Ou/Mxb11x8pkj+APn8wfxTI9N</latexit>

h(t)s<latexit sha1_base64="aVF0r1eV07lm60ZzxOwZYso5oOw=">AAAB8HicbVDLSgNBEOyNrxhfUY9eFoMQL2FXBT0GvXiMYB6SrGF2MkmGzMwuM71CWPIVXjwo4tXP8ebfOEn2oIkFDUVVN91dYSy4Qc/7dnIrq2vrG/nNwtb2zu5ecf+gYaJEU1ankYh0KySGCa5YHTkK1oo1IzIUrBmObqZ+84lpwyN1j+OYBZIMFO9zStBKD8OueUzLeDrpFktexZvBXSZ+RkqQodYtfnV6EU0kU0gFMabtezEGKdHIqWCTQicxLCZ0RAasbakikpkgnR08cU+s0nP7kbal0J2pvydSIo0Zy9B2SoJDs+hNxf+8doL9qyDlKk6QKTpf1E+Ei5E7/d7tcc0oirElhGpub3XpkGhC0WZUsCH4iy8vk8ZZxT+veHcXpep1FkcejuAYyuDDJVThFmpQBwoSnuEV3hztvDjvzse8NedkM4fwB87nD3NfkCs=</latexit>

↵(t)<latexit sha1_base64="63Xbrxgd3rUXdSpnqtLgz6m9pCA=">AAAB83icbVBNS8NAEN34WetX1aOXxSLUS0lU0GPRi8cK9gOaWCbbbbt0swm7E6GE/g0vHhTx6p/x5r9x2+agrQ8GHu/NMDMvTKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDsFwKRRvoEDJ24nmEIWSt8LR7dRvPXFtRKwecJzwIIKBEn3BAK3k+yCTITxmFTybdEtlt+rOQJeJl5MyyVHvlr78XszSiCtkEozpeG6CQQYaBZN8UvRTwxNgIxjwjqUKIm6CbHbzhJ5apUf7sbalkM7U3xMZRMaMo9B2RoBDs+hNxf+8Tor96yATKkmRKzZf1E8lxZhOA6A9oTlDObYEmBb2VsqGoIGhjaloQ/AWX14mzfOqd1F17y/LtZs8jgI5JiekQjxyRWrkjtRJgzCSkGfySt6c1Hlx3p2PeeuKk88ckT9wPn8ArpyRcQ==</latexit>

�(t)<latexit sha1_base64="ydW70+VDoTf8gkBy26RPEIRiEWU=">AAAB8nicbVBNS8NAEJ34WetX1aOXYBHqpSQq6LHoxWMF+wFtLJvtpl262Q27E6GE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTAQ36Hnfzsrq2vrGZmGruL2zu7dfOjhsGpVqyhpUCaXbITFMcMkayFGwdqIZiUPBWuHoduq3npg2XMkHHCcsiMlA8ohTglbqdEOG5DGr4NmkVyp7VW8Gd5n4OSlDjnqv9NXtK5rGTCIVxJiO7yUYZEQjp4JNit3UsITQERmwjqWSxMwE2ezkiXtqlb4bKW1LojtTf09kJDZmHIe2MyY4NIveVPzP66QYXQcZl0mKTNL5oigVLip3+r/b55pRFGNLCNXc3urSIdGEok2paEPwF19eJs3zqn9R9e4vy7WbPI4CHMMJVMCHK6jBHdShARQUPMMrvDnovDjvzse8dcXJZ47gD5zPH+RpkP0=</latexit>

h(t)u<latexit sha1_base64="qDKnQIUoq5h8zL04MFi7Jm6wihQ=">AAAB8HicbVDLSgNBEOyNrxhfUY9eFoMQL2FXBT0GvXiMYB6SrGF2MkmGzMwuM71CWPIVXjwo4tXP8ebfOEn2oIkFDUVVN91dYSy4Qc/7dnIrq2vrG/nNwtb2zu5ecf+gYaJEU1ankYh0KySGCa5YHTkK1oo1IzIUrBmObqZ+84lpwyN1j+OYBZIMFO9zStBKD8Nu8piW8XTSLZa8ijeDu0z8jJQgQ61b/Or0IppIppAKYkzb92IMUqKRU8EmhU5iWEzoiAxY21JFJDNBOjt44p5Ypef2I21LoTtTf0+kRBozlqHtlASHZtGbiv957QT7V0HKVZwgU3S+qJ8IFyN3+r3b45pRFGNLCNXc3urSIdGEos2oYEPwF19eJo2zin9e8e4uStXrLI48HMExlMGHS6jCLdSgDhQkPMMrvDnaeXHenY95a87JZg7hD5zPH3ZzkC0=</latexit>

rL(t�1)

L(t�1)

✓(t�1)<latexit sha1_base64="orbVZxQmX2yq0LMvtwxo5V+FjBs=">AAACOHicbVBNS8NAEN3Urxq/qh69BIugB0uigh6LXjwIVrAqNLFMttt26WYTdidCCflZXvwZ3sSLB0W8+gvc1CJ+PVj28d4MM/PCRHCNrvtglSYmp6ZnyrP23PzC4lJleeVCx6mirEljEaurEDQTXLImchTsKlEMolCwy3BwVPiXN0xpHstzHCYsiKAneZdTQCO1K6e+hFBA5keAfQoiO8mvs03c9rby3Pftf+RCDWPR0cPIfJmPfYbw5bYrVbfmjuD8Jd6YVMkYjXbl3u/ENI2YRCpA65bnJhhkoJBTwXLbTzVLgA6gx1qGSoiYDrLR4bmzYZSO042VeRKdkfq9I4NIF3uayuIQ/dsrxP+8VordgyDjMkmRSfo5qJsKB2OnSNHpcMUoiqEhQBU3uzq0DwoomqxtE4L3++S/5GKn5u3W3LO9av1wHEeZrJF1skk8sk/q5Jg0SJNQckseyTN5se6sJ+vVevssLVnjnlXyA9b7B8iSrPU=</latexit>

Figure 1: The framework of meta learner and recommender. In each scenario, the recommender is initialized by the initial-izer and updated by the update controller. After the learning process is stopped by the stop controller, the loss of the finalrecommender on the test set is computed, and the parameters of meta learner are updated by the meta-gradient.

happens in practice, e.g., new promotion scenarios come out withvery sparse related user-item instances.

Table 1: Notations

Notation Definition or DescriptionsU,I,C the user set, item set and scenario set

m = |U|,n = |I | numbers of users and itemsc a recommendation scenario

Hc ⊂ U × I the interaction set of cTc the learning task connected with c

Dtrainc ,Dtest

c the training set and testing set of Tcfc : U × I → R the recommender function for c

θc parameters of fcU , I embedding matrices for users and itemsM meta learner

Tmeta-train,Tmeta-test meta-training set and meta-testing setω = {ωR ,ωu ,ωs } parameters of the meta learner

ωR shared initial parameters for fcωu ,ωs parameters of update and stop controllersα , β input gate and forget gatep(t ) stop probability at step t

3.2 Scenario-Specific Meta Learning SettingsThe recommendation task in the scenario c can be considered asa learning task, whose training set is Dtrain

c . Our goal is to learn ameta learnerM that, given Dtrain

c , predicts parameters of fc , θc .

Similar to the standard few-shot learning settings [39, 52], weassume access to a set of training tasks as the meta-trainingset, denoted as Tmeta-train. Each training task Tc ∈ Tmeta-traincorresponds to a scenario c and has its training/testing pairs:Tc = {Dtrain

c ,Dtestc }. Dtest

c is the set of pairwise testing instances:Dtestc := {(u, i, i−)|(u, i) ∈ Hc ∧ (u, i) < Dtrain

c ∧ (u, i−) < Hc }. Themeta-training set can be easily built with the previous scenariosin the recommender system, by randomly dividing the user-iteminteractions in each scenario into the training set and testing set.The parameters ω of the meta learner is optimized w.r.t. the meta-training objective:

minωETc

[Lω (Dtest

c |Dtrainc )

], (1)

where Lω (Dtestc |Dtrain

c ) is the loss on the testing set Dtestc given the

training set Dtrainc and meta learner parametersω. Specifically, we

use the hinge loss as the loss function:

Lω (Dtestc |Dtrain

c ) =∑

(uk ,ik ,i−k )∈Dtestc

ℓ(uk , ik , i−k ;θc )|Dtest

c |, (2)

ℓ(u, i, i−;θc ) = max(0,γ − f (u, i;θc ) + f (u, i−;θc )

), (3)

where the margin γ is set to 1 and θc = M(Dtrainc ;ω). θc is gener-

ated via a sequential process, which we call scenario-specific learn-ing. During the scenario-specific learning, the meta learner initial-izes θc and updates it via gradient descent of flexible steps. Afterthe learning, we will dump the training set and use the recom-mender fc for further tasks in the scenario c . During the evaluation,a different set of learning tasks is used, called meta-testing set

Page 4: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

Tmeta-test, whose scenarios are unseen during meta-training, i.e.Tmeta-train ∩ Tmeta-test = ∅.

4 THE PROPOSED FRAMEWORKIn this section, we detail the modules of the recommender networkand meta learner for scenario-specific learning.

4.1 Recommender NetworkWe apply a feedforward neural network as the recommender fc ,which consists of three modules, to take inputs (u, i) and generatecorresponding outputs fc (u, i).

Embedding Module (u, i → eu ,ei ): This module consists oftwo embedding matrices U ∈ Rm×d and I ∈ Rn×d for usersand items respectively. The embeddings can be generated fromuser/item attributes and/or general interactions without contextualinformation, with methods in collaborative filtering [26, 41] or net-work embedding [15, 37]. The user u and item i are first mappedto one-hot vectors xu ∈ {0, 1}m and x i ∈ {0, 1}n , where onlythe element corresponding to the user/item id is 1 and all othersare 0. The one-hot vectors are then transformed into continuousrepresentations by embedding matrices: eu = Uxu and ei = Ix i .

Hidden Layer Module (eu ,ei → zui ): This module is thecentral part of the recommender. Similar to deep recommenda-tion models in [9, 10], the user and item embeddings are con-catenated as eui = [eu ,ei ]. eui is then mapped by L hiddenlayers to a continuous representation of user-item interactionzui = ϕL(· · · (ϕ1(eui )) · · · ). The l-th layer can be represented as:

ϕl (z) = ReLU(W lz + bl ), (4)

whereW l and bl are the weight matrix and the bias vector of thel-th layer.

Output Module zui → fc (u, i): This module computes therecommendation score fc (u, i) based on the mapped representationof user-item interaction from the last hidden layer. This is achievedby a linear layer as fc (u, i) = wT zui , where w is the weight ofoutput layer.

The recommender parameters θc , include {(W l ,bl )Ll=1,w}, arelearned during scenario-specific learning. Note that the embeddingmatricesU and I are not included, to improve the recommender’sgeneralization ability for unobserved users and items in Dtrain

c .In other words, embedding matrices are shared across differentscenarios and kept fixed in scenario-specific learning.

4.2 s2MetaIn the scenario-specific learning, the recommender parameters θcare learned from the training setDtrain

c .We summarize the followingthree challenges in the learning:

• How should the parameters θc be initialized? Randomly initial-ized parameters can take a long time to converge and often leadto overfitting in the few-shot setting.• How should the parameters θc be updated w.r.t the loss func-tion? Traditional optimization algorithms rely on carefully tunedhyperparameters to converge to a good solution. Optimal hyper-parameters on different scenarios may vary a lot.

Algorithm 1 Training Algorithm of Meta LearnerInput: Meta-training set Tmeta-train, Loss function L1: ωu ,ωs ,ωR ← Random Initialization2: for d ← 1,K do ▷ K is the number of meta-training steps3: Dtrain

c ,Dtestc ← Random scenario from Tmeta-train

4: θ (0)c ← ωR ,T ← 05: for t ← 1,Tmax do6: B(t ) ← Random batch from Dtrain

c7: L(t ) ← L(B(t );θ (t−1)c )8: p(t ) ← Ms (L(t ), | |∇θ (t−1)c

L(t ) | |2;ωs ) ▷ Equation (9)9: s(t ) ∼ Bernoulli(p(t )) ▷ Randomly decide to stop or not10: if s(t ) = 1 then11: Break;12: end if13: θ (t )c ← Update θ (t−1)c according to Equations (7) and (8)14: T ← T + 115: end for16: Ltest ← L(Dtest

c ;θ (T )c )17: Updateωu ,ωR using ∇ωuLtest, ∇ωRLtest

18: dωs ← 019: for j ← 1,T do20: dωs ← dωs +

(Ltest − L(Dtest

c ;θ (j)c ))∇ωs ln(1 − p(j))

21: end for22: Updateωs using dωs ▷ Equation (11)23: end for

• When should the learning process stop? In few-shot setting,learning too much from a small training set can lead to overfittingand hurt the generalization performance.

These challenges are often solved by experts manually in traditionalmachine learning settings. Instead, we propose s2Meta which canautomatically learn to control the learning process from end to end,including parameter initialization, update strategy and early-stoppolicy. In the following part, we will introduce how the meta learnercontrols the three parts of scenario-specific learning and how themeta learner is trained on the meta-training set Tmeta-train.

4.2.1 Parameter Initialization. At the beginning of scenario-specific learning, the recommender parameters are initialized asθ (0)c . Traditionally, the parameters of a neural network are initial-ized by randomly sampling from a normal distribution or uniformdistribution. Given enough training data, the randomly initializedparameters can usually converge to a good local optimum but maytake a long time. In the few-shot setting, however, random initial-ization combined with limited training data can lead to seriousoverfitting, which hurts the ability of the trained recommender togeneralize well. Instead, following [14], we initialize the recom-mender parameters from the global initial values shared acrossdifferent scenarios. These initial values are considered as one ofmeta learner’s parameters, denoted as ωR . Suitable initial param-eters may not perform well on a specific scenario, but can adaptquickly to new scenarios given a small amount of training data.

4.2.2 Update Strategy. At each step t , a batch B(t ) =

{uk , ik , i−k }Nk=1 with the fixed size N is sampled from Dtrain

c , where

Page 5: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

(uk , ik ) ∈ Dtrainc and i−k is sampled from I such that (uk , i−k ) <

Dtrainc . Then the previous parameters θ (t−1)c are updated to θ (t )c

according to L(t ), the loss on B(t ):

L(t ) =∑

(uk ,ik ,i−k ∈B(t ))

ℓ(uk , ik , i−k ;θ(t−1)c )

|B(t ) |, (5)

where ℓ is the loss function in Equation (3).The most common method to update parameters of a neural

network is stochastic gradient descent (SGD) [42]. In SGD, theparameters θc are updated as:

θ (t )c = θ(t−1)c − α∇

θ (t−1)cL(t ), (6)

where α is the learning rate. There are many variations based onSGD, such as Adam [24] and RMSprop [48], both of which adjustthe learning rate dynamically.

Although hand-crafted optimization algorithms have gained suc-cess in training deep networks, they rely on a large amount oftraining data and carefully selected hyperparameters to convergeto a good solution. In few-shot learning, the inappropriate learn-ing rate can easily lead to being stuck in a poor local optimum.Moreover, optimal learning rates in different scenarios can differsignificantly. Instead, we extend the idea of learning the optimiza-tion algorithm in [39] to learn an update controller for parameters inθc , implementing a more flexible update strategy than hand-craftedalgorithms. The update strategy for θc is:

θ (t )c = β (t ) ⊙ θ (t−1)c − α (t ) ⊙ ∇θ (t−1)cL(t ), (7)

where α (t ) is the input gate, similar to the learning rate in SGD;and β (t ) is the forget gate, which can help the meta learner quickly"forget" the previous parameters to leave a poor local optimum.

Since the optimization process is a sequential process, in whichthe historical information matters, we use LSTM [20] to encodehistorical information and issue input gates and forget gates:

h(t )u ,c(t )u = LSTM([∇

θ (t−1)cL(t ),L(t ),θ (t−1)c ],h(t−1)u ,c(t−1)u ),

β (t ) = σ (W Fh(t )u + bF ),

α (t ) = σ (W Ih(t )u + bI ),

(8)

where LSTM(·, ·, ·) represents one-step forward in standard LSTM,h(t )u and c(t )u are hidden state and cell state of the update controllerat step t , and σ (·) is the sigmoid function. The parameters of theupdated controller, denoted as ωu , include {W F ,bF ,W I ,bI } aswell as LSTM related parameters. Different parameters in θc corre-spond to different update controllers. In this way, different learningstrategies can be applied.

4.2.3 Early-Stop Policy. In machine learning, overfitting is one ofthe critical problems that restrict the performance of models. Whenthe model’s representation ability exceeds the complexity of theproblem, which is often the case for neural networks, the modelmight fit the sampling variance and random noise in the trainingdata and get poor performance on the testing set. Regularizationtricks like L2 regularizer or dropout[45] are often applied to limit

the model’s complexity. Another common trick is early-stop: thelearning process is stopped when the training loss stops descendingor the performance on the validation set begins to drop.

In few-shot setting, as we found, regularizers cannot preventoverfitting to the small training set. Also, as the size of the trainingset is too small, the validation set divided from the training setcannot provide a precise estimation of generalization ability. Toovercome the drawback of hand-crafted stop rules, we propose tolearn the stop policy with a neural network, which we call stopcontroller Ms . To balance exploitation and exploration, we applya stochastic stop policy, in which at step t , the learning processstops with probability p(t ), which is predicted by Ms . Similar tothe update controller, Ms is an LSTM that can encode historicalinformation:

h(t )s ,c(t )s = LSTM([L(t ), | |∇

θ (t−1)cL(t ) | |2],h(t−1)s ,c(t−1)s ),

p(t ) = σ (W sh(t )s + bs ),

(9)

where | |∇θ (t−1)cL(t ) | |2 is L2-norm of gradients at step t , and h(t )s and

c(t )s are the hidden state and the cell state of the stop controller atstep t . The parameters of the stop controller, denoted asωs , include{W s ,bs } as well as LSTM related parameters.

4.2.4 Training of Meta Learner. The objective of the meta learneris to minimize the expected loss on the testing set after scenario-specific learning on the training set, as is described in Equations (1)to (3). The parameters ω of the meta learner include shared ini-tial parametersωR , parameters of the update controllers ωu andparameters of the stop controllerωs .

The gradients ofωR andωu with respect to the meta-testing lossLω (Dtest

c |Dtrainc ), which we call meta-gradient, can be computed

with back-propagation. However, the meta-gradient may involvehigher-order derivatives, which are quite expensive to computewhen T is large. Therefore in MAML[14], they only take one-stepgradient descent. Instead, our model can benefit from flexible multi-step gradient descent. We ignore higher-order derivatives in themeta-gradient by taking the gradient of the recommender parame-ters, ∇

θ (t−1)cL(t ), as independent ofωR andωu . With the gradients,

we can optimizeωR andωu with normal SGD.Since the relation betweenωs and θc is discrete and stochastic, it

is impossible to take direct gradient descent onωs . We use stochas-tic policy gradient to optimize ωs . Given one learning trajectorybased on stop controller parameters ωs : θ (0)c ,θ

(1)c ,θ

(2)c , · · · ,θ

(T )c ,

We define the immediate reward at step t as the loss decrease ofone-step update: r (t ) = L(Dtest

c ;θ (t−1)c ) − L(Dtestc ;θ (t )c ). Then the

accumulative reward at t is:

Q(t ) =T∑i=t

r (t ) = L(Dtest;θ (t−1)c ) − L(Dtest;θ (T )c ), (10)

which is the loss decrease from step t to the end of learning . Ac-cording to the REINFORCE algorithm[54], we can updateωs as:

ωs ← ωs + γT∑t=1

Q(t )∇ωs lnMs (L(t ), | |∇θ (t−1)cL(t ) | |2;ωs ), (11)

Page 6: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

where γ is the learning rate forωs . The detailed training algorithmfor the s2Meta learner is described in Algorithm 1.

5 EXPERIMENTIn this section, we detail the experiments to evaluate the proposeds2Meta in the few-shot scenario-aware recommendation, includingdetails of the datasets, competitors, experimental settings and com-parison results. We will also delve deeper to analyze how s2Metahelps the recommender network achieves better results with pro-cess sensitivity analyses. Finally, we analyze how the architectureof the recommender influences performance.

5.1 Experimental Settings5.1.1 Datasets. Current available open datasets for context-awarerecommender systems are mostly of small scale with very sparsecontextual information [22]. We evaluate our method on two publicdatasets and one newly released large scenario-based dataset fromTaobao3. The first dataset is the Amazon Review dataset[17], whichcontains product reviews and metadata from Amazon. We keepthe reviews with ratings no less than three as the positive user-item interactions and take interactions in different categories asdifferent scenarios. Our second dataset is the Movielens-20M [16]dataset, with rating scores from the movie recommendation serviceMovielens. Following [18], we transform the explicit ratings intoimplicit data, where all the movies the user has rated are taken aspositive items for the user. The tags of movies are used as scenarios.In each dataset, we select scenarios with less than 1,000 but morethan 100 items/movies as few-shot tasks with enough interactionsfor evaluation. Our third dataset is from the click log of CloudTheme, which is a crutial recommendation procedure in the Taobaoapp. Different themes correspond to different scenarios of purchase,e.g., "what to take when traveling" "how to dress up yourself on aparty" "things to prepare when a baby is coming". In each scenario,a collection of items in related categories is displayed, accordingto the scenario as well as the user’s interests. The dataset includesmore than 1.4 million clicks from 355 different scenarios in a 6-dayspromotion season, with one-month purchase history of users beforethe promotion started. Table 2 summarizes the statistics of threedatasets.

5.1.2 Baselines. We select state-of-the-art baselines in item rec-ommendation in the same domain or cross-domains which can bebroadly divided into the following categories:

Heuristic: ItemPop: Items are ranked according to their popularityin specific scenarios, judged by the number of interactionsin corresponding training sets. This is a non-personalizedbaseline in item recommendation[41].

General Domain: NeuMF: The Neural Collaborative Filtering[18]is a state-of-the-art item recommendation method. It com-bines the traditional MF and MLP in neural networks topredict user-item interactions. This method is not proposedfor cross-domain recommendation, and we train a singlemodel on all the scenarios.

3Downloadable from https://tianchi.aliyun.com/dataset/dataDetail?dataId=9716.

Shallow Cross-Domain: (1) CDCF: The Cross-Domain Collab-orative Filtering[30] is a simple method based on Factor-ization Machines [40] for cross-domain recommendation.By factorizing the interaction data of shared users in thesource domain, it allows extra information to improve rec-ommendation in a target domain; (2) CMF: Collective MatrixFactorization[43] is a matrix factorization method for cross–domain recommendation. It jointly learns low-rank repre-sentations for a collection of matrices by sharing commonentity factors, which enables the knowledge transfer acrossdomains. We use it as a shallow cross-domain baseline, withusers partially overlapping in different domains.

Deep Cross-Domain: (1) EMCDR: The Embedding and Mappingframework for Cross-Domain Recommendation [32] con-sists of two steps. Step 1 is to learn the user and item embed-dings in each domain with latent factor models. Step 2 is tolearn the embedding mapping between two domains witha multilayer perceptron from the shared users/items in twodomains; (2) CoNet: The Collaborative Cross Networks[21]enables dual knowledge transfer across domains by intro-ducing cross connections from one base network to anotherand vice versa.

Table 2: Statistics of theDatasets. #Inter. denotes the numberof user-item interactions and #Scen. denotes the number ofscenarios we use as few-shot tasks.

Dataset #Users #Items #Inter. #Scen.Amazon 766,337 492,505 17,523,124 1,289Movielens 138,493 27,278 20,000,263 306Taobao 775,603 1,452,525 5,717,835 355

More details in experimental settings can be found in Appen-dix A.

5.2 Performance ComparisonIn this subsection, we report comparison results and summarizeinsights. Table 3 shows the results on the three datasets concerningtop-N recalls [47]. We can see that s2Meta achieves the best resultsthroughout three datasets, compared to both shallow cross-domainmodels (CMF and CDCF) and deep cross-domain models (EMCDRand CoNet). Involving the extracted scenario information, s2Metaalso performs better compared to the general recommendation(NeuMF) and the heuristic method ItemPop.

For the Amazon dataset, s2Meta gives 9.41% improvements onaverage compared with the best baseline (EMCDR), and achieves16.47% improvements in terms of Recall@50 compared to the non-scenario-aware NeuMF, which demonstrates the benefits of com-bining scenario information. Among the cross-domain baselines,neural cross-domain methods are slightly better than the shallowcross-domain methods.

For the Movielens dataset, s2Meta achieves 2.87% improvementson average compared to the best baseline (CoNet). It is a phenome-non common in Amazon and Movielens that domain-specific meth-ods can perform better than the left competitors, showing that the

Page 7: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

Table 3: The top-N recall results on test scenarios.

Amazon Movielens TaobaoMethod Recall@10 Recall@20 Recall@50 Recall@10 Recall@20 Recall@50 Recall@20 Recall@50 Recall@100NeuMF 24.55 35.65 55.19 31.67 51.30 84.98 25.64 42.31 58.84ItemPop 26.86 32.65 50.42 39.65 54.32 78.12 18.25 28.57 32.44CDCF 10.27 16.72 32.79 29.19 41.04 65.75 9.81 20.93 32.14CMF 27.64 38.31 55.24 29.80 46.91 74.37 9.16 16.47 29.31EMCDR 31.71 42.14 58.73 43.55 60.89 83.54 20.43 31.52 45.67CoNet 30.17 41.57 56.06 46.62 63.61 87.06 20.27 31.48 44.53

s2Meta 34.39 46.53 64.28 47.79 66.02 89.07 27.11 44.10 59.98

Improve 8.35 10.42 9.45 2.51 3.79 2.31 5.73 4.23 1.90

user interests relatively concentrate in a specific scenario. Also,we can find that the deep cross-domain methods can outperformshallow cross-domain methods by improvements over 10%. Theperformance of NeuMF, which is not designed for cross-domainrecommendation, is also superior to CMF and CDCF, showing thepower of deep learning in recommendation.

For the Taobao dataset, s2Meta achieves 3.95% performance liftson average. For this dataset, NeuMF performs best among all thebaselines. The reason might be that this dataset is from the clicklog in the real-world recommendation and clicks often containmore noise than purchase or ratings. The performances of baselinesthat adapt to the scenario in a naive way might be detrimentalby overfitting in the few-shot setting. Above all, s2Meta can stilloutperform NeuMF with a flexible learning strategy learned by theproposed meta learner.

Note that the relative improvements by taking account of sce-nario information vary among three datasets. For Amazon, mostscenario-specific baselines can perform better than the general do-main baseline, while for the Taobao dataset, most scenario-specificbaselines cannot outperform NeuMF. It implies that the usefulnessof scenario information varies, and s2Meta can dynamically adaptto different datasets’ requirements.

5.3 Process Sensitivity AnalysisIn this subsection, we analyze how the s2Meta works to control thescenario-specific learning process. Due to the space limit, we onlyillustrate results from Amazon and performance patterns are thesame from the other two datasets.

5.3.1 Impact of Different Parts. First, we analyze the contributionsof three parts of meta learner described in Section 4.2. Specifically,we compare the performances of the following variations:

• Complete: The complete meta learner that controls parameterinitialization, update strategy, and early-stop policy.• RandInit: Parameters of the recommender are randomly ini-tialized. The update strategy and early-stop policy remainunchanged.• FixedLr: Parameters of the recommender are updated bystandard SGD, with fixed learning rate 0.01. Parameter ini-tialization and early-stop policy remain unchanged.

• FixedStep: The step of scenario-specific learning is fixed at 20,while parameter initialization and update strategy remainconsistent with the complete method.

The performance comparison is listed in Table 4. We can see thatthe complete model can significantly outperform the three weak-ened variations, indicating that three parts of the meta learner allhelp to improve the final performance. Among the three variations,the random initialization hurts the performance most. When theparameters are randomly initialized, the recommender has to learnfrom a random point each time and the learning will take moresteps and more easily be stuck in a local optimum. The effect ofremoving early-stop policy is relatively small. The reason might bethat when the learning process is too long, the update controllercan still give low input gates to avoid overfitting.

Table 4: Impact of different parts inmeta learner onAmazondataset

Method Recall@10 Recall@20 Recall@50RandInit 33.12 45.44 63.35FixedLr 33.56 45.90 63.86FixedStep 33.84 46.13 64.02Complete 34.39 46.53 64.28

5.3.2 Case Study: the learning process learned by the meta learner.To further analyze the learning process the meta learner learnedautomatically, we select a typical scenario on Amazon and visualizeits learning process. Figure 2 shows the training loss, stop prob-ability(predicted by the stop controller), recall on the test set(notvisible to the meta learner) and the input and forget gates(predictedby the update controller). From the leftmost figure, we can see thatas the training loss ceases to drop, the stop probability rises dramat-ically, finally leading to the end of the learning process. In fact wecan find that the recall on the test set has been stable and furthertraining might lead to overfitting. From the visualization of inputgates and forget gates, different update strategies are applied fordifferent layers and different parameters. For different layers, wecan find that for the last hidden layer, the input gates remain lowand the forget gates remain high, which indicates its parameters

Page 8: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

0 2 4 6 8

performance

training lossstop probrecall

0 2 4 6 8

input gate (weight)layer1layer3layer2

0 2 4 6 8

forget gate (weight)

layer1layer3layer2

0 2 4 6 8

input gate (bias)layer1layer3layer2

0 2 4 6 8

forget gate (bias)

layer1layer3layer2

Figure 2: The visualization of the learning process. Left most is the training loss, stop probability and recall on the testingset during the learning process. The other four figures are average input gates and forget gates of weights and biases in thehidden layers.

10 20 50Amazon

30

40

50

60

Reca

ll

MapInteraction

10 20 50Movielens

40

50

60

70

80

90

Reca

ll

MapInteraction

20 50 100Taobao

20

30

40

50

60

Reca

ll

MapInteraction

Figure 3: Impact of the recommender architecture on the Amazon(left), Movielens(middle) and Taobao(right) datasets.

remain stable during training. Among different parameters, theweight matrices mainly change at the beginning, and the bias vec-tors mainly change at the end. This strategy might help to quicklyadjust the parameters by updating a part of the parameters at atime. Also, updating the bias vectors can be a minor complementto the updated weight matrices.

5.4 Analysis of Recommender ArchitectureIn this subsection, we analyze how the architecture of the recom-mender influences the performance. We compare the following twoarchitectures for the recommender:• Mapping Module: The user and item embeddings are mappedby two multilayer perceptrons respectively, and the finalscore is computed as the dot product of the mapped user anditem embeddings. This is the architecuture of EMCDR.• Interaction Module: User and item embeddings are concate-nated and a multilayer perceptron computes the final scorefrom the concatenated vector. This is the architecture usedby NeuMF and CoNet.

We compare the performance of two architectures with the samemeta learner. The results on three datasets are shown in Figure 3.In general, we can see that in general the interaction module canperform better than the mapping module, because the interactionmodule can represent the complicated interactions between usersand items, while the mapping module only maps the user and itemembeddings independently. On Taobao, the relative improvementof interaction modules can reach 6.71%. However, on Amazon,

the mapping module slightly outperforms the interaction module.This implies that the optimal recommender architecture may dif-fer among datasets. It’s also noted that in practice the mappingmodule can be more efficient because given the recommender, themapped embeddings can be pre-computed and the matching canbe optimized with advanced data structures.

6 CONCLUSIONIn this work, we explored few-shot learning for recommendationin the scenario-specific setting. We proposed a novel sequentialscenario-specific framework for recommender systems using metalearning, to solve the cold-start problem in some recommendationscenarios. s2Meta can automatically control the learning process toconverge to a good solution and avoid overfitting, which has beenthe critical issue of few-shot learning. Experiments on real-worlddatasets demonstrate the effectiveness of the proposed method, bycomparing with shallow/deep, general/scenario-specific baselines.In the future, we will automate the architecture design of recom-menders. In our experiments, we found that the performance of thesame architecture might differ in different datasets. By learning tochoose the optimal recommender architecture, the performance ofs2Meta can be further improved.

ACKNOWLEDGMENTSJie Tang and Hongxia Yang are the corresponding authors of thispaper. The work is supported by the NSFC for Distinguished Young

Page 9: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

Scholar (61825602), Tsinghua University Initiative Scientific Re-search Program, and a research fund supported by Alibaba.

REFERENCES[1] Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and Alexan-

der Tuzhilin. 2005. Incorporating contextual information in recommender sys-tems using a multidimensional approach. ACM Trans. Inf. Syst. 23, 1 (2005),103–145.

[2] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the Next Gener-ation of Recommender Systems: A Survey of the State-of-the-Art and PossibleExtensions. IEEE Trans. Knowl. Data Eng. 17, 6 (2005), 734–749.

[3] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Recom-mender Systems. In Recommender Systems Handbook. 191–226.

[4] Marcin Andrychowicz, Misha Denil, Sergio Gomez Colmenarejo, Matthew W.Hoffman, David Pfau, Tom Schaul, and Nando de Freitas. 2016. Learning to learnby gradient descent by gradient descent. In NIPS. 3981–3989.

[5] Parivash Ashrafi, Yi Sun, Neil Davey, Rod Adams, Marc B. Brown, MariaPrapopoulou, and Gary P. Moss. 2015. The importance of hyperparametersselection within small datasets. In IJCNN. 1–8.

[6] Michael J. A. Berry and Gordon Linoff. 1997. Data mining techniques - formarketing, sales, and customer support. Wiley.

[7] Pavel Brazdil, Christophe Giraud-Carrier, Carlos Soares, and Ricardo Vilalta. 2008.Metalearning: Applications to Data Mining (1 ed.). Springer Publishing Company,Incorporated.

[8] Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cre-monesi. 2015. Cross-Domain Recommender Systems. In Recommender SystemsHandbook. 919–959.

[9] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016. Wide & Deep Learning for Recommender Systems. In RecSys Workshop.7–10.

[10] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks forYouTube Recommendations. In RecSys. 191–198.

[11] Tiago Cunha, Carlos Soares, and Andr C.P.L.F. de Carvalho. 2018. Metalearningand Recommender Systems. Inf. Sci. 423, C (Jan. 2018), 128–144.

[12] Tiago Cunha, Carlos Soares, and André C. P. L. F. de Carvalho. 2016. SelectingCollaborative Filtering Algorithms Using Metalearning. In ECML/PKDD. 393–409.

[13] Michael D. Ekstrand and John Riedl. 2012. When recommenders fail: predictingrecommender failure for algorithm selection and combination. In RecSys. 233–236.

[14] Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML. 1126–1135.

[15] William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Represen-tation Learning on Large Graphs. In NIPS. 1025–1035.

[16] F. Maxwell Harper and Joseph A. Konstan. 2016. TheMovieLens Datasets: Historyand Context. TiiS 5, 4 (2016), 19:1–19:19.

[17] Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the VisualEvolution of Fashion Trends with One-Class Collaborative Filtering. InWWW.507–517.

[18] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-SengChua. 2017. Neural Collaborative Filtering. In WWW. 173–182.

[19] Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networkswith Top-k Gains for Session-based Recommendations. In CIKM. 843–852.

[20] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.Neural Computation 9, 8 (1997), 1735–1780.

[21] Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. CoNet: Collaborative CrossNetworks for Cross-Domain Recommendation. In CIKM. 667–676.

[22] Sergio Ilarri, Raquel Trillo Lado, and Ramón Hermoso. 2018. Datasets for Context-Aware Recommender Systems: Current Context and Possible Directions. In ICDEWorkshops. 25–28.

[23] Peter Ingwersen and Kalervo Järvelin. 2005. Information Retrieval in Context:IRiX. SIGIR Forum 39, 2 (Dec. 2005), 31–39.

[24] Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti-mization. In International Conference on Learning Representations (ICLR).

[25] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neuralnetworks for one-shot image recognition. In ICML Deep Learning Workshop,Vol. 2.

[26] Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix FactorizationTechniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30–37.

[27] Su Mon Kywe, Ee-Peng Lim, and Feida Zhu. 2012. A Survey of RecommenderSystems in Twitter. In SocInfo. 420–433.

[28] Fei-Fei Li, Robert Fergus, and Pietro Perona. 2006. One-Shot Learning of ObjectCategories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594–611.

[29] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com Recommenda-tions: Item-to-Item Collaborative Filtering. IEEE Internet Computing 7, 1 (2003),76–80.

[30] Babak Loni, Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Cross-DomainCollaborative Filtering with Factorization Machines. In ECIR. 656–661.

[31] Denis A. Lussier and Richard W. Olshavsky. 1979. Task Complexity and Con-tingent Processing in Brand Choice. Journal of Consumer Research 6, 2 (1979),154–165.

[32] Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-DomainRecommendation: An Embedding and Mapping Approach. In IJCAI. 2464–2470.

[33] TsendsurenMunkhdalai and Hong Yu. 2017. Meta Networks. In ICML. 2554–2563.[34] Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-

Learning Algorithms. CoRR abs/1803.02999 (2018).[35] Cosimo Palmisano, Alexander Tuzhilin, and Michele Gorgoglione. 2008. Us-

ing Context to Improve Predictive Modeling of Customers in PersonalizationApplications. IEEE Trans. Knowl. Data Eng. 20, 11 (2008), 1535–1549.

[36] John W. Payne, James R. Bettman, Eloise Coupey, and Eric J. Johnson. 1992. Aconstructive process view of decision making: Multiple strategies in judgmentand choice. Acta Psychologica 80, 1 (1992), 107 – 141.

[37] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learningof social representations. In SIGKDD. 701–710.

[38] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-aware Recommender Systems. In UMAP. 373–374.

[39] Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shotlearning. In ICLR, Vol. 2. 6.

[40] Steffen Rendle. 2010. Factorization Machines. In ICDM. 995–1000.[41] Steffen Rendle, Christoph Freudenthaler, ZenoGantner, and Lars Schmidt-Thieme.

2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. 452–461.

[42] Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method.Ann. Math. Statist. 22, 3 (09 1951), 400–407.

[43] Ajit Paul Singh and Geoffrey J. Gordon. 2008. Relational learning via collectivematrix factorization. In Proceedings of the 14th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August24-27, 2008. 650–658.

[44] Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical Networksfor Few-shot Learning. In NIPS. 4080–4090.

[45] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and RuslanSalakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks fromOverfitting. JMLR 15, 1 (Jan. 2014), 1929–1958.

[46] Kostas Stefanidis, Evaggelia Pitoura, and Panos Vassiliadis. 2007. A context-awarepreference database system. Int. J. Pervasive Computing and Communications 3, 4(2007), 439–460.

[47] Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendationvia Convolutional Sequence Embedding. In WSDM. 565–573.

[48] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide thegradient by a running average of its recent magnitude. Technical Report.

[49] Lisa Torrey and Jude Shavlik. 2009. Transfer Learning.[50] Jan N. van Rijn and Frank Hutter. 2018. Hyperparameter Importance Across

Datasets. In SIGKDD. 2367–2376.[51] Manasi Vartak, Arvind Thiagarajan, ConradoMiranda, Jeshua Bratman, andHugo

Larochelle. 2017. A Meta-Learning Perspective on Cold-Start Recommendationsfor Items. In NIPS. 6907–6917.

[52] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and DaanWierstra. 2016. Matching Networks for One Shot Learning. In NIPS. 3630–3638.

[53] Maksims Volkovs, Guang Wei Yu, and Tomi Poutanen. 2017. DropoutNet: Ad-dressing Cold Start in Recommender Systems. In NIPS. 4964–4973.

[54] Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms forConnectionist Reinforcement Learning. Machine Learning 8 (1992), 229–256.

[55] Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang.2018. One-Shot Relational Learning for Knowledge Graphs. In EMNLP. 1980–1990.

[56] Deqing Yang, Jingrui He, Huazheng Qin, Yanghua Xiao, and Wei Wang. 2015. AGraph-based Recommendation across Heterogeneous Domains. In CIKM. 463–472.

Page 10: Sequential Scenario-Specific Meta Learner for Online ... · recommender to be broadly suitable to many scenarios, finetunes its parameters with flexible updating strategy, and stops

A APPENDIXIn this section, we first give the deployment description and imple-mentation notes of our proposed models. The implementation notesand parameter configurations of compared methods are then given.Finally, we introduce the experiment settings in three datasets.A.1 DeploymentDeployment is at the Guess You Like session, the front page of theMobile Taobao; and the illustration video can also be watched fromthe link4.A.2 Implementation NotesThe architecture of the recommender network is designed as threehidden layers and one output layer. The embedding size is set to 128on Amazon and Alibaba Theme and 64 on Movielens. The numberof hidden units in each layer is half of that in the last layer. Thehidden size of the update LSTM is set to 16. The hidden size of thestop LSTM is set to 18. For the input of the update controllers andstop controller, we use the preprocessing trick in [4]:

x →{( log( |x |)p , sgn(x)) if |x | ≥ e−p

(−1, epx) otherwise(12)

where p = 10 in our experiment. s2Meta is trained by standard SGDwith learning rate 0.0001 and weight decay 1e-5, implemented withPyTorch5 1.0 in Python 3.6 and runs on a single Linux server with8 NVIDIA GeForce GTX 1080.A.3 Compared MethodsA.3.1 Code. The code of NeuMF is provided by the author6. Wesimply adapt the input and evaluation modules to our experimentalsettings. For CMF, we adpot the implementation of LIBMF7. ForCDCF, we dapot the official libFM implementation8. The codes ofEMCDR and CoNet are not published and we implement them withPyTorch.A.3.2 Parameter Configuration. We do grid search over the hy-perparameters of CDCF, CMF, EMCDR, and CoNet, with the meta-training set. The search space for CDCF and CMF is learning rate(0.1, 0.03, 0.001), learning step (10, 100, 200) and factor dimension(8, 16, 32). The search space for EMCDR is learning rate(0.1, 0.01, 0.001, 0.0001) and learning step (10, 100, 1000). The searchspace for CoNet is learning rate (0.01, 0.001), learning step(10, 50, 100, 500) and sparse ratio λ (0.1, 0.01, 0.001). For NeuMF, weuse the pre-training suggested by the author, with embedding sizeof MF model set to 8 and MLP layers set to [64, 32, 16, 8].A.4 Experiment SettingAs we mention in Section 5.1.1, it is challenging to find large-scalepublic datasets to evaluate context-aware recommender systems.Therefore we evaluate our method on two public datasets in thecross-domain setting. We also use one scenario-based dataset inTaobao.

On the Amazon Review9 dataset, we take user purchases indifferent leaf categories as different scenarios. To maximize the4https://youtu.be/TNHLZqWnQwc5https://pytorch.org/6http://github.com/hexiangnan/neural_collaborative_filtering7https://www.csie.ntu.edu.tw/~cjlin/libmf/8http://www.libfm.org9http://jmcauley.ucsd.edu/data/amazon

difference between the source domain and the target domain andfully evaluate the model’s ability to learn user behaviors in newscenarios, we select the leaf categories in different first-order cate-gories as the source domain, meta-training set and meta-testing set.The specific split of first-order categories is displayed in Table 5.

On the Movielens-20M10 dataset, we take the movie tags in theTag Genome included in the dataset as scenarios. The tag genomeis a data structure that contains tag relevance scores for moviesthat are computed based on user-contributed content. Tags withrelevance scores higher than 0.8 to a movie are considered as tagsof the movie. We use the movies without tags as the source domain,and randomly divide the tags into the meta-training set and meta-testing set.

On both datasets, we select the scenarios with 100-1000items/movies as the few-shot tasks with enough interactions forevaluation. Unlike traditional cross-domain settings in which theinteractions on different domains are simply concatenated, we aremore interested in the cold-start scenarios, on which most usershave no previous interactions. Therefore on each scenario, 64 in-teractions of shared users are used as Dtrain

c and all others are usedfor evaluation. We use all the interactions on the source domainto generate the user embeddings and the interactions of the usersonly appearing on the scenario to generate the item embeddings,with Matrix Factorization[26]. For cross-domain baselines, eachscenario is considered as a target domain. The training data is allthe interactions on the source domain, the few-shot interactionsof shared users Dtrain

c , and all the interactions of the users onlyappearing on the target domain. Therefore, all the comparativemethods use the same training data and evaluation set, and thefairness of comparison is guaranteed.

On the Taobao Themes11 dataset, we take the different themesas different scenarios and all the clicks are considered as positiveuser-item interactions. We use the one-month purchase historyof users before the click log as the source domain and randomlydivide the scenarios into the meta-training set and meta-testing set.As this dataset is quite sparse, we use the general user and itemembeddings generated from attributes and interactions in Taobao byGraphSage[15]. This embeddings are also used by NeuMF, EMCDRand CoNet.

Table 5: Dataset split on Amazon

Domain First-order Category

Source Domain

Books, Electronics, Movies and TVClothing, Shoes and Jewelry, Home and KitchenSports and Outdoors, Health and Personal Care

Toys and Games, Apps for AndroidGrocery and Gourmet Food

Meta-training Beauty, CDs and Vinyl, Kindle StoreTools and Home Improvement, Pet Supplies

Meta-testing

Digital Music, Musical InstrumentsVideo Games, Cell Phones and Accessories

Baby, Automotive, Office ProductsAmazon Instant Video, Patio, Lawn and Garden

10https://grouplens.org/datasets/movielens/11https://tianchi.aliyun.com/dataset/dataDetail?dataId=9716.