8
Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang 1,2 , Bowen Hao 1,2 , Bo Chen 1,2 , Cuiping Li 1,2 , Hong Chen 1,2 , Jimeng Sun 3 1 Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China 2 Information School, Renmin University of China 3 Computational Science and Engineering at College of Computing, Georgia Institute of Technology {zhang-jing, jeremyhao, bochen, licuiping, chong}@ruc.edu.cn, [email protected] Abstract The proliferation of massive open online courses (MOOCs) demands an effective way of personalized course recommen- dation. The recent attention-based recommendation models can distinguish the effects of different historical courses when recommending different target courses. However, when a user has interests in many different courses, the attention mecha- nism will perform poorly as the effects of the contributing courses are diluted by diverse historical courses. To address such a challenge, we propose a hierarchical reinforcement learning algorithm to revise the user profiles and tune the course recommendation model on the revised profiles. Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from XuetangX—one of the largest MOOCs in China. Experimen- tal results show that the proposed model significantly outper- forms the state-of-the-art recommendation models (improv- ing 5.02% to 18.95% in terms of HR@10). Introduction Nowadays, massive open online courses, or MOOCs, are attracting widespread interest as an alternative education model. Lots of MOOCs platforms such as Coursera, edX and Udacity have been built and provide low cost opportuni- ties for anyone to access a massive number of courses from the worldwide top universities. The proliferation of hetero- geneous courses in MOOCs platforms demands an effective way of personalized course recommendation for their users. The problem can be simply formalized as given a set of historical courses that were enrolled by a user before time t, we aim at recommending the most relevant courses that will be enrolled by the user at time t +1. We can view the his- torical enrolled courses as a user’s profile, and the key factor of recommendation is to accurately characterize and model the user’s preference from her profile. Many state-of-the-art algorithms have been proposed to model users’ preferences in different ways. For example, when ignoring the order of the historical courses, we can adopt the factored item sim- ilarity model (FISM) (Kabbur, Ning, and Karypis 2013) to represent each course as an embedding vector and average the embeddings of all the historical courses to represent a Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Data Structure Programming Foundation Operation System Logic Calculus Contemporary Physics Psychology Big Data Systems Real target course 11.50 9.63 3.32 3.00 0.39 Linear Algebra 3.77 1.80 5.43 Random target course Historical courses Historical courses Recommendation probability Attention coefficients Data Structure Programming Foundation Operation System Logic Calculus Contemporary Physics Psychology 5.43 4.64 5.65 5.15 Linear Algebra 4.51 6.57 4.25 Financial Management 0.42 Figure 1: A motivating example of course recommenda- tion. The scores on top of the historical courses are the atten- tion coefficients calculated by NAIS and the scores on top of the target courses are the recommendation probabilities predicted by NAIS (He et al. 2018). The goal of this paper is to remove the courses with few contributions in a prediction as much as possible. user’s preference. To capture the order of the courses, we can input a temporal sequence of the historical courses into the gated recurrent unit (GRU) model (Hidasi et al. 2016) and output the last embedding vector as the user preference. However, the model fidelity is limited by the assumption that all the historical courses play the same role at estimating the similarity between the user profile and the target course. To distinguish the effects of different courses, attention-based models such as neural attentive item similarity (NAIS) (He et al. 2018) and neural attentive session-based recommen- dation (NASR) (Li et al. 2017) can be used to estimate an attention coefficient for each historical course as its impor- tance in recommending the target course. Although existing attention-based models improve the recommendation performance, it still poses unsolved chal- lenges. Firstly, when a user enrolled diverse courses, the effects of the courses that indeed reflect the user’s inter- est in the target course will be diluted by many irrelevant courses. For example, Figure 1 illustrates a recommenda- tion result calculated by NAIS (He et al. 2018). The score on top of each historical course represents the calculated at- tention coefficient 1 . The real target course “Big Data Sys- tems” is not successfully recommended in the top 10 ranked courses. Although the major contributing historical courses like “Data Structure”, “Operation System” and “Program- 1 The sum of the attentions is normalized larger than 1 to lessen the punishment of active users (Cf. (He et al. 2018) for details).

Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

Hierarchical Reinforcement Learning for Course Recommendation in MOOCs

Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping Li1,2, Hong Chen1,2, Jimeng Sun3

1Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China2Information School, Renmin University of China

3Computational Science and Engineering at College of Computing, Georgia Institute of Technology{zhang-jing, jeremyhao, bochen, licuiping, chong}@ruc.edu.cn, [email protected]

Abstract

The proliferation of massive open online courses (MOOCs)demands an effective way of personalized course recommen-dation. The recent attention-based recommendation modelscan distinguish the effects of different historical courses whenrecommending different target courses. However, when a userhas interests in many different courses, the attention mecha-nism will perform poorly as the effects of the contributingcourses are diluted by diverse historical courses. To addresssuch a challenge, we propose a hierarchical reinforcementlearning algorithm to revise the user profiles and tune thecourse recommendation model on the revised profiles.Systematically, we evaluate the proposed model on a realdataset consisting of 1,302 courses, 82,535 users and458,454 user enrolled behaviors, which were collected fromXuetangX—one of the largest MOOCs in China. Experimen-tal results show that the proposed model significantly outper-forms the state-of-the-art recommendation models (improv-ing 5.02% to 18.95% in terms of HR@10).

IntroductionNowadays, massive open online courses, or MOOCs, areattracting widespread interest as an alternative educationmodel. Lots of MOOCs platforms such as Coursera, edXand Udacity have been built and provide low cost opportuni-ties for anyone to access a massive number of courses fromthe worldwide top universities. The proliferation of hetero-geneous courses in MOOCs platforms demands an effectiveway of personalized course recommendation for their users.

The problem can be simply formalized as given a set ofhistorical courses that were enrolled by a user before time t,we aim at recommending the most relevant courses that willbe enrolled by the user at time t + 1. We can view the his-torical enrolled courses as a user’s profile, and the key factorof recommendation is to accurately characterize and modelthe user’s preference from her profile. Many state-of-the-artalgorithms have been proposed to model users’ preferencesin different ways. For example, when ignoring the order ofthe historical courses, we can adopt the factored item sim-ilarity model (FISM) (Kabbur, Ning, and Karypis 2013) torepresent each course as an embedding vector and averagethe embeddings of all the historical courses to represent a

Copyright c© 2019, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Data Structure

Programming Foundation

Operation System

LogicCalculus

Contemporary Physics Psychology

Big DataSystems

Real target course

11.50 9.63 3.32 3.00 0.39

Linear Algebra

3.77 1.80 5.43

Random target course

Historical courses

Historical courses

Recommendation probabilityAttention coefficients

Data Structure

Programming Foundation

Operation System

LogicCalculus

Contemporary Physics Psychology

5.43 4.64 5.65 5.15

Linear Algebra

4.51 6.57 4.25

FinancialManagement

0.42

Figure 1: A motivating example of course recommenda-tion. The scores on top of the historical courses are the atten-tion coefficients calculated by NAIS and the scores on top of thetarget courses are the recommendation probabilities predicted byNAIS (He et al. 2018). The goal of this paper is to remove thecourses with few contributions in a prediction as much as possible.

user’s preference. To capture the order of the courses, wecan input a temporal sequence of the historical courses intothe gated recurrent unit (GRU) model (Hidasi et al. 2016)and output the last embedding vector as the user preference.However, the model fidelity is limited by the assumption thatall the historical courses play the same role at estimating thesimilarity between the user profile and the target course. Todistinguish the effects of different courses, attention-basedmodels such as neural attentive item similarity (NAIS) (Heet al. 2018) and neural attentive session-based recommen-dation (NASR) (Li et al. 2017) can be used to estimate anattention coefficient for each historical course as its impor-tance in recommending the target course.

Although existing attention-based models improve therecommendation performance, it still poses unsolved chal-lenges. Firstly, when a user enrolled diverse courses, theeffects of the courses that indeed reflect the user’s inter-est in the target course will be diluted by many irrelevantcourses. For example, Figure 1 illustrates a recommenda-tion result calculated by NAIS (He et al. 2018). The scoreon top of each historical course represents the calculated at-tention coefficient1. The real target course “Big Data Sys-tems” is not successfully recommended in the top 10 rankedcourses. Although the major contributing historical courseslike “Data Structure”, “Operation System” and “Program-

1The sum of the attentions is normalized larger than 1 to lessenthe punishment of active users (Cf. (He et al. 2018) for details).

Page 2: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

ming Foundation” are assigned relatively high attention co-efficients, their effects are discounted by many other cate-gories of courses such as psychology, physics and mathe-matics after aggregating all the historical courses by theirattentions. Secondly, even if no historical courses can con-tribute in predicting a random target course, each historicalcourse will still be rigidly assigned an attention coefficient,which may cause the random target course ranked before thereal target one, as demonstrated by the random course “Fi-nancial Management” in Figure 1. In summary, the historicalnoisy courses that make small or even no contributions maydisturb the prediction results significantly, even if they areassigned small attention coefficients.

To deal with the above issues, we propose to revise userprofiles by removing the noisy courses instead of assigningan attention coefficient to each of them. The key challengeis that we do not have explicit/supervised information aboutwhich courses from the history are noises and should be re-moved. We propose a hierarchal reinforcement learning al-gorithm to solve it. Specifically, we formalize the revisingof a user profile to be a hierarchical sequential decision pro-cess. A high-level task and a low-level task are performedto remove the noisy courses, under the supervision of thefeedback from the environment that consists of the datasetand a pre-trained basic recommendation model. Essentially,the profile reviser and the basic recommendation model arejointly trained together. Our contributions include:

• We propose a novel model for course recommendation inMOOCs, which consists of a profile reviser and a basicrecommendation model. With joint training of the twomodels, we can effectively remove the noisy courses inuser profiles.

• We propose a hierarchical reinforcement learning algo-rithm to revise the user profiles, which enables the modelto remove the noise courses without explicit annotations.

• We collect a dataset, consisting of 1,302 courses, 82,535users and 458,454 user enrolled behaviors, from Xue-tangX, one of the largest MOOCs in China, to evalu-ate the proposed model. Experimental results show thatthe proposed model significantly outperforms the state-of-the-art baselines (improving 5.02% to 18.95% in termsof HR@10).

MOOC DataWe collect the dataset from XuetangX2, one of the largestMOOCs platforms in China. We unify the same courses of-fered in different years such as “Data Structure(2017)” and“Data Structure(2018)” into one course and only select theusers who enrolled at least three courses from October 1st,2016 to March 31st, 2018. The resulting dataset consistsof 1,302 courses which belonging to 23 categories, 82,535users and 458,454 user-course pairs. We also collect the du-ration of each video in a course watched by a user. Beforetraining the model, we conduct a series of analyses to inves-tigate why we need to revise the user profiles.

2http://www.xuetangx.com

0 10 20 30 40 50#Courses

08000

160002400032000

#Use

rs

0 5 10 15 20#Categories

06000

120001800024000

#Use

rs

(a) #Courses or #Categories distribu-tion

0.0 0.2 0.4 0.6 0.8 1.0#Categories / #Courses

02000400060008000

100001200014000

#Use

rs

(b) #Categories/#Courses distribution

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9#Categories / #Courses

0.00.10.20.30.40.50.60.70.80.9

Rec

omm

enda

tion

Prob

.

(c) Average recommendation proba-bility

0.0 0.2 0.4 0.6 0.8 1.0Effort

0

10000

20000

30000

40000

50000

#(U

ser,C

ours

e)

0.0 0.3 0.6 0.9Effort

015003000450060007500

#(U

ser,C

ours

e)

(d) Effort distribution

Figure 2: Data distributions.

Figure 2(a) presents the distribution of the enrolled coursenumber of a user in the top and the distribution of the cat-egory number of the enrolled courses in the bottom. Thenwe calculate the ratio between the category number and thecourse number as the category ratio of a profile to repre-sent the attentiveness of a user. A bigger category ratio in-dicates the user is more distractive, while a smaller categoryratio indicates the user is more attentive. Figure 2(b) showsthe distribution of the category ratio. From the three figures,we can see that although a large number of users enrolleda small number of courses and categories, the ratio betweenthem is relatively evenly distributed. We further average theprobabilities of recommending a real target course calcu-lated by NAIS (He et al. 2018) for the user profiles of thesame category ratio and present the probability distributionover category ratio in Figure 2(c). We can see that the prob-ability decreases with the increase of the category ratio. Insummary, all these analyses indicate that a large number ofusers enrolled diverse courses, and the recommendation per-formance based on these diverse profiles is impacted. Thus,we have to study how to revise the user profiles. In addition,we calculate the ratio between the watch duration and thetotal duration of a video as the watch ratio, and use the max-imal watch ratio of all the videos in a course to representthe effort taken by the user in the course. We present the ef-fort distribution of user enrolled courses in Figure 2(d) andthe filtered effort distribution (i.e., effort larger than 0.01) inthe embeded subfigure, which indicate that users take dis-tinguished effort in different courses. The phenomenon canguide the design of the agent policy later.

Background: Recommendation ModelsProblem FormulationLet U = {u1, · · · , u|U |} be a set of users and C ={c1, · · · , c|C|} be a set of courses in the MOOCs platform.

Page 3: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

For each user u, given her profile, i.e., the historical enrolledcourses Eu := (eu1 , · · · , eutu) with eut ∈ C, we are aimingat recommending the courses u would enroll at next timetu + 1. We deal with the relative time instead of the abso-lute time the same as (Rendle, Freudenthaler, and Schmidt-Thieme 2010).

The Basic Recommendation ModelThe key factor of recommendation is to accurately character-ize a user’s preference according to her profile Eu. The gen-eral idea is, we represent each historical course eut as a real-valued low dimensional embedding vector put , and aggre-gate the embeddings of all the historical courses pu1 , . . . ,putto represent user u’s preference qu. If we also represent atarget course ci as an embedding vector pi, the probabilityof recommending course ci to user u, i.e.,P (y = 1|Eu, ci),can be calculated as:

P (y = 1|Eu, ci) = σ(qTupi), (1)

where y = 1 indicates that ci is recommended to user u andσ is the sigmoid function to transform the input into a proba-bility. Then the key issue is how to obtain the aggregated em-bedding qu. One straightforward way is to average the em-beddings of all the historical courses, i.e. qu = 1

tu

∑tut=1 put .

However, equally treating all the courses’ contributions mayimpact the representation of a user’s real interest in a tar-get course. Thus, as NAIS (He et al. 2018) does, we canadopt the attention mechanism to estimate an attention coef-ficient auit for each historical course eut when recommendingci. Specifically, we parameterize the attention coefficient auitas a function with put and pi as inputs and then aggregate theembeddings according to their attentions:

qu =

tu∑t=1

auitput , a

uit = f(put ,pi), (2)

where f can be instantiated by a multi-layer perception onthe concatenation or the element-wise product of the twoembeddings put and pi.

We can also adopt NASR (Li et al. 2017)—an attentiverecurrent neural networks to capture the order of the histor-ical courses. Specifically, at each time t, NASR outputs ahidden vector hut to represent a user’s preference until timet based on both the course enrolled at time t and all the pre-vious courses before t. Then the same attention mechanismis applied on the hidden vectors of all the timestamps.

The Proposed ModelIn this section, we firstly give an overview of the proposed

model, then we introduce a hierarchal reinforcement learn-ing algorithm to revise user profiles, and finally explain thetraining process of the entire model.OverviewAlthough the basic recommendation models can estimatean attention coefficient for each historical course, the ef-fects of the contributing courses to the target one may be

Internal reward G

Original profile

al1<latexit sha1_base64="kIHVcnY7j1VVHbIJieYqn/zSpvU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU68B5ldVCuuDV3AbJOvJxUIEdzUP7qD2OWRlwhk9SYnucm6GdUo2CSz0r91PCEsgkd8Z6likbc+Nni3Bm5sMqQhLG2pZAs1N8TGY2MmUaB7Ywojs2qNxf/83ophtd+JlSSIldsuShMJcGYzH8nQ6E5Qzm1hDIt7K2EjammDG1CJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gAghY7D</latexit><latexit sha1_base64="kIHVcnY7j1VVHbIJieYqn/zSpvU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU68B5ldVCuuDV3AbJOvJxUIEdzUP7qD2OWRlwhk9SYnucm6GdUo2CSz0r91PCEsgkd8Z6likbc+Nni3Bm5sMqQhLG2pZAs1N8TGY2MmUaB7Ywojs2qNxf/83ophtd+JlSSIldsuShMJcGYzH8nQ6E5Qzm1hDIt7K2EjammDG1CJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gAghY7D</latexit><latexit sha1_base64="kIHVcnY7j1VVHbIJieYqn/zSpvU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU68B5ldVCuuDV3AbJOvJxUIEdzUP7qD2OWRlwhk9SYnucm6GdUo2CSz0r91PCEsgkd8Z6likbc+Nni3Bm5sMqQhLG2pZAs1N8TGY2MmUaB7Ywojs2qNxf/83ophtd+JlSSIldsuShMJcGYzH8nQ6E5Qzm1hDIt7K2EjammDG1CJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gAghY7D</latexit><latexit sha1_base64="kIHVcnY7j1VVHbIJieYqn/zSpvU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU68B5ldVCuuDV3AbJOvJxUIEdzUP7qD2OWRlwhk9SYnucm6GdUo2CSz0r91PCEsgkd8Z6likbc+Nni3Bm5sMqQhLG2pZAs1N8TGY2MmUaB7Ywojs2qNxf/83ophtd+JlSSIldsuShMJcGYzH8nQ6E5Qzm1hDIt7K2EjammDG1CJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gAghY7D</latexit>

al2<latexit sha1_base64="xlDxksMnzdRu9clnl6F+YCJK/OU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU6qD/K6qBccWvuAmSdeDmpQI7moPzVH8YsjbhCJqkxPc9N0M+oRsEkn5X6qeEJZRM64j1LFY248bPFuTNyYZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDK/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AEiDI7E</latexit><latexit sha1_base64="xlDxksMnzdRu9clnl6F+YCJK/OU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU6qD/K6qBccWvuAmSdeDmpQI7moPzVH8YsjbhCJqkxPc9N0M+oRsEkn5X6qeEJZRM64j1LFY248bPFuTNyYZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDK/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AEiDI7E</latexit><latexit sha1_base64="xlDxksMnzdRu9clnl6F+YCJK/OU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU6qD/K6qBccWvuAmSdeDmpQI7moPzVH8YsjbhCJqkxPc9N0M+oRsEkn5X6qeEJZRM64j1LFY248bPFuTNyYZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDK/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AEiDI7E</latexit><latexit sha1_base64="xlDxksMnzdRu9clnl6F+YCJK/OU=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVOlU6qD/K6qBccWvuAmSdeDmpQI7moPzVH8YsjbhCJqkxPc9N0M+oRsEkn5X6qeEJZRM64j1LFY248bPFuTNyYZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDK/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AEiDI7E</latexit>

P (y = 1|Eu, ci)<latexit sha1_base64="7+PdycnkBfWNdTGY+8xY/dy6JOc=">AAACB3icbVDLSsNAFJ3UV62vqEtBBotQQUoigm6EigguK9gHNDFMptN26GQSZiZCiNm58VfcuFDErb/gzr9x0mah1QMXDufcy733+BGjUlnWl1Gam19YXCovV1ZW19Y3zM2ttgxjgUkLhywUXR9JwignLUUVI91IEBT4jHT88UXud+6IkDTkNyqJiBugIacDipHSkmfuNmvJmX3vjJBKnQCpEUYsvcyy2/gQe/TAM6tW3ZoA/iV2QaqgQNMzP51+iOOAcIUZkrJnW5FyUyQUxYxkFSeWJEJ4jIakpylHAZFuOvkjg/ta6cNBKHRxBSfqz4kUBVImga8781PlrJeL/3m9WA1O3ZTyKFaE4+miQcygCmEeCuxTQbBiiSYIC6pvhXiEBMJKR1fRIdizL/8l7aO6bdXt6+Nq47yIowx2wB6oARucgAa4Ak3QAhg8gCfwAl6NR+PZeDPep60lo5jZBr9gfHwDdE+ZCA==</latexit><latexit sha1_base64="7+PdycnkBfWNdTGY+8xY/dy6JOc=">AAACB3icbVDLSsNAFJ3UV62vqEtBBotQQUoigm6EigguK9gHNDFMptN26GQSZiZCiNm58VfcuFDErb/gzr9x0mah1QMXDufcy733+BGjUlnWl1Gam19YXCovV1ZW19Y3zM2ttgxjgUkLhywUXR9JwignLUUVI91IEBT4jHT88UXud+6IkDTkNyqJiBugIacDipHSkmfuNmvJmX3vjJBKnQCpEUYsvcyy2/gQe/TAM6tW3ZoA/iV2QaqgQNMzP51+iOOAcIUZkrJnW5FyUyQUxYxkFSeWJEJ4jIakpylHAZFuOvkjg/ta6cNBKHRxBSfqz4kUBVImga8781PlrJeL/3m9WA1O3ZTyKFaE4+miQcygCmEeCuxTQbBiiSYIC6pvhXiEBMJKR1fRIdizL/8l7aO6bdXt6+Nq47yIowx2wB6oARucgAa4Ak3QAhg8gCfwAl6NR+PZeDPep60lo5jZBr9gfHwDdE+ZCA==</latexit><latexit sha1_base64="7+PdycnkBfWNdTGY+8xY/dy6JOc=">AAACB3icbVDLSsNAFJ3UV62vqEtBBotQQUoigm6EigguK9gHNDFMptN26GQSZiZCiNm58VfcuFDErb/gzr9x0mah1QMXDufcy733+BGjUlnWl1Gam19YXCovV1ZW19Y3zM2ttgxjgUkLhywUXR9JwignLUUVI91IEBT4jHT88UXud+6IkDTkNyqJiBugIacDipHSkmfuNmvJmX3vjJBKnQCpEUYsvcyy2/gQe/TAM6tW3ZoA/iV2QaqgQNMzP51+iOOAcIUZkrJnW5FyUyQUxYxkFSeWJEJ4jIakpylHAZFuOvkjg/ta6cNBKHRxBSfqz4kUBVImga8781PlrJeL/3m9WA1O3ZTyKFaE4+miQcygCmEeCuxTQbBiiSYIC6pvhXiEBMJKR1fRIdizL/8l7aO6bdXt6+Nq47yIowx2wB6oARucgAa4Ak3QAhg8gCfwAl6NR+PZeDPep60lo5jZBr9gfHwDdE+ZCA==</latexit><latexit sha1_base64="7+PdycnkBfWNdTGY+8xY/dy6JOc=">AAACB3icbVDLSsNAFJ3UV62vqEtBBotQQUoigm6EigguK9gHNDFMptN26GQSZiZCiNm58VfcuFDErb/gzr9x0mah1QMXDufcy733+BGjUlnWl1Gam19YXCovV1ZW19Y3zM2ttgxjgUkLhywUXR9JwignLUUVI91IEBT4jHT88UXud+6IkDTkNyqJiBugIacDipHSkmfuNmvJmX3vjJBKnQCpEUYsvcyy2/gQe/TAM6tW3ZoA/iV2QaqgQNMzP51+iOOAcIUZkrJnW5FyUyQUxYxkFSeWJEJ4jIakpylHAZFuOvkjg/ta6cNBKHRxBSfqz4kUBVImga8781PlrJeL/3m9WA1O3ZTyKFaE4+miQcygCmEeCuxTQbBiiSYIC6pvhXiEBMJKR1fRIdizL/8l7aO6bdXt6+Nq47yIowx2wB6oARucgAa4Ak3QAhg8gCfwAl6NR+PZeDPep60lo5jZBr9gfHwDdE+ZCA==</latexit>

Reward R

eu1<latexit sha1_base64="Wp/RmXHSjZh6hGlj4s/EuAELAng=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4oD7zGtDsoVt+YuQNaJl5MK5GgOyl/9YczSCKVhgmrd89zE+BlVhjOBs1I/1ZhQNqEj7FkqaYTazxbnzsiFVYYkjJUtachC/T2R0UjraRTYzoiasV715uJ/Xi814bWfcZmkBiVbLgpTQUxM5r+TIVfIjJhaQpni9lbCxlRRZmxCJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gA0Vo7Q</latexit><latexit sha1_base64="Wp/RmXHSjZh6hGlj4s/EuAELAng=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4oD7zGtDsoVt+YuQNaJl5MK5GgOyl/9YczSCKVhgmrd89zE+BlVhjOBs1I/1ZhQNqEj7FkqaYTazxbnzsiFVYYkjJUtachC/T2R0UjraRTYzoiasV715uJ/Xi814bWfcZmkBiVbLgpTQUxM5r+TIVfIjJhaQpni9lbCxlRRZmxCJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gA0Vo7Q</latexit><latexit sha1_base64="Wp/RmXHSjZh6hGlj4s/EuAELAng=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4oD7zGtDsoVt+YuQNaJl5MK5GgOyl/9YczSCKVhgmrd89zE+BlVhjOBs1I/1ZhQNqEj7FkqaYTazxbnzsiFVYYkjJUtachC/T2R0UjraRTYzoiasV715uJ/Xi814bWfcZmkBiVbLgpTQUxM5r+TIVfIjJhaQpni9lbCxlRRZmxCJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gA0Vo7Q</latexit><latexit sha1_base64="Wp/RmXHSjZh6hGlj4s/EuAELAng=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4oD7zGtDsoVt+YuQNaJl5MK5GgOyl/9YczSCKVhgmrd89zE+BlVhjOBs1I/1ZhQNqEj7FkqaYTazxbnzsiFVYYkjJUtachC/T2R0UjraRTYzoiasV715uJ/Xi814bWfcZmkBiVbLgpTQUxM5r+TIVfIjJhaQpni9lbCxlRRZmxCJRuCt/ryOmnXa55b8+7rlcZtHkcRzuAcLsGDK2jAHTShBQwm8Ayv8OYkzovz7nwsWwtOPnMKf+B8/gA0Vo7Q</latexit>

eu4<latexit sha1_base64="ZrCfoIH1JINmLTWbDvyFfxcYiNE=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBhPBU9gNgh4DevAYwTwkWcPspJMMmdldZmaFsOQrvHhQxKuf482/cZLsQRMLGoqqbrq7glhwbVz328mtrW9sbuW3Czu7e/sHxcOjpo4SxbDBIhGpdkA1Ch5iw3AjsB0rpDIQ2ArG1zO/9YRK8yi8N5MYfUmHIR9wRo2VHsrYSy+mj0m5Vyy5FXcOskq8jJQgQ71X/Or2I5ZIDA0TVOuO58bGT6kynAmcFrqJxpiyMR1ix9KQStR+Oj94Ss6s0ieDSNkKDZmrvydSKrWeyMB2SmpGetmbif95ncQMrvyUh3FiMGSLRYNEEBOR2fekzxUyIyaWUKa4vZWwEVWUGZtRwYbgLb+8SprViudWvLtqqXaTxZGHEziFc/DgEmpwC3VoAAMJz/AKb45yXpx352PRmnOymWP4A+fzBwCtj98=</latexit><latexit sha1_base64="ZrCfoIH1JINmLTWbDvyFfxcYiNE=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBhPBU9gNgh4DevAYwTwkWcPspJMMmdldZmaFsOQrvHhQxKuf482/cZLsQRMLGoqqbrq7glhwbVz328mtrW9sbuW3Czu7e/sHxcOjpo4SxbDBIhGpdkA1Ch5iw3AjsB0rpDIQ2ArG1zO/9YRK8yi8N5MYfUmHIR9wRo2VHsrYSy+mj0m5Vyy5FXcOskq8jJQgQ71X/Or2I5ZIDA0TVOuO58bGT6kynAmcFrqJxpiyMR1ix9KQStR+Oj94Ss6s0ieDSNkKDZmrvydSKrWeyMB2SmpGetmbif95ncQMrvyUh3FiMGSLRYNEEBOR2fekzxUyIyaWUKa4vZWwEVWUGZtRwYbgLb+8SprViudWvLtqqXaTxZGHEziFc/DgEmpwC3VoAAMJz/AKb45yXpx352PRmnOymWP4A+fzBwCtj98=</latexit><latexit sha1_base64="ZrCfoIH1JINmLTWbDvyFfxcYiNE=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBhPBU9gNgh4DevAYwTwkWcPspJMMmdldZmaFsOQrvHhQxKuf482/cZLsQRMLGoqqbrq7glhwbVz328mtrW9sbuW3Czu7e/sHxcOjpo4SxbDBIhGpdkA1Ch5iw3AjsB0rpDIQ2ArG1zO/9YRK8yi8N5MYfUmHIR9wRo2VHsrYSy+mj0m5Vyy5FXcOskq8jJQgQ71X/Or2I5ZIDA0TVOuO58bGT6kynAmcFrqJxpiyMR1ix9KQStR+Oj94Ss6s0ieDSNkKDZmrvydSKrWeyMB2SmpGetmbif95ncQMrvyUh3FiMGSLRYNEEBOR2fekzxUyIyaWUKa4vZWwEVWUGZtRwYbgLb+8SprViudWvLtqqXaTxZGHEziFc/DgEmpwC3VoAAMJz/AKb45yXpx352PRmnOymWP4A+fzBwCtj98=</latexit><latexit sha1_base64="ZrCfoIH1JINmLTWbDvyFfxcYiNE=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBhPBU9gNgh4DevAYwTwkWcPspJMMmdldZmaFsOQrvHhQxKuf482/cZLsQRMLGoqqbrq7glhwbVz328mtrW9sbuW3Czu7e/sHxcOjpo4SxbDBIhGpdkA1Ch5iw3AjsB0rpDIQ2ArG1zO/9YRK8yi8N5MYfUmHIR9wRo2VHsrYSy+mj0m5Vyy5FXcOskq8jJQgQ71X/Or2I5ZIDA0TVOuO58bGT6kynAmcFrqJxpiyMR1ix9KQStR+Oj94Ss6s0ieDSNkKDZmrvydSKrWeyMB2SmpGetmbif95ncQMrvyUh3FiMGSLRYNEEBOR2fekzxUyIyaWUKa4vZWwEVWUGZtRwYbgLb+8SprViudWvLtqqXaTxZGHEziFc/DgEmpwC3VoAAMJz/AKb45yXpx352PRmnOymWP4A+fzBwCtj98=</latexit>

Revised profile

Recommendation probability

Basic Recommendation ModelProfile Reviser

pi<latexit sha1_base64="Zp9d2XqnsnkeNas4OwpT+4/eKOc=">AAAB+XicbVBNS8NAEN3Ur1q/oh69BFvBU0l60WNBDx4r2FZoQ9hsJ+3SzSbsTool5J948aCIV/+JN/+N24+Dtj4YeLw3w8y8MBVco+t+W6WNza3tnfJuZW//4PDIPj7p6CRTDNosEYl6DKkGwSW0kaOAx1QBjUMB3XB8M/O7E1CaJ/IBpyn4MR1KHnFG0UiBbdf6CE8YRnlaBDkvaoFddevuHM468ZakSpZoBfZXf5CwLAaJTFCte56bop9ThZwJKCr9TENK2ZgOoWeopDFoP59fXjgXRhk4UaJMSXTm6u+JnMZaT+PQdMYUR3rVm4n/eb0Mo2s/5zLNECRbLIoy4WDizGJwBlwBQzE1hDLFza0OG1FFGZqwKiYEb/XlddJp1D237t03qs3bZRxlckbOySXxyBVpkjvSIm3CyIQ8k1fyZuXWi/VufSxaS9Zy5pT8gfX5A6Ctk6Q=</latexit><latexit sha1_base64="Zp9d2XqnsnkeNas4OwpT+4/eKOc=">AAAB+XicbVBNS8NAEN3Ur1q/oh69BFvBU0l60WNBDx4r2FZoQ9hsJ+3SzSbsTool5J948aCIV/+JN/+N24+Dtj4YeLw3w8y8MBVco+t+W6WNza3tnfJuZW//4PDIPj7p6CRTDNosEYl6DKkGwSW0kaOAx1QBjUMB3XB8M/O7E1CaJ/IBpyn4MR1KHnFG0UiBbdf6CE8YRnlaBDkvaoFddevuHM468ZakSpZoBfZXf5CwLAaJTFCte56bop9ThZwJKCr9TENK2ZgOoWeopDFoP59fXjgXRhk4UaJMSXTm6u+JnMZaT+PQdMYUR3rVm4n/eb0Mo2s/5zLNECRbLIoy4WDizGJwBlwBQzE1hDLFza0OG1FFGZqwKiYEb/XlddJp1D237t03qs3bZRxlckbOySXxyBVpkjvSIm3CyIQ8k1fyZuXWi/VufSxaS9Zy5pT8gfX5A6Ctk6Q=</latexit><latexit sha1_base64="Zp9d2XqnsnkeNas4OwpT+4/eKOc=">AAAB+XicbVBNS8NAEN3Ur1q/oh69BFvBU0l60WNBDx4r2FZoQ9hsJ+3SzSbsTool5J948aCIV/+JN/+N24+Dtj4YeLw3w8y8MBVco+t+W6WNza3tnfJuZW//4PDIPj7p6CRTDNosEYl6DKkGwSW0kaOAx1QBjUMB3XB8M/O7E1CaJ/IBpyn4MR1KHnFG0UiBbdf6CE8YRnlaBDkvaoFddevuHM468ZakSpZoBfZXf5CwLAaJTFCte56bop9ThZwJKCr9TENK2ZgOoWeopDFoP59fXjgXRhk4UaJMSXTm6u+JnMZaT+PQdMYUR3rVm4n/eb0Mo2s/5zLNECRbLIoy4WDizGJwBlwBQzE1hDLFza0OG1FFGZqwKiYEb/XlddJp1D237t03qs3bZRxlckbOySXxyBVpkjvSIm3CyIQ8k1fyZuXWi/VufSxaS9Zy5pT8gfX5A6Ctk6Q=</latexit><latexit sha1_base64="Zp9d2XqnsnkeNas4OwpT+4/eKOc=">AAAB+XicbVBNS8NAEN3Ur1q/oh69BFvBU0l60WNBDx4r2FZoQ9hsJ+3SzSbsTool5J948aCIV/+JN/+N24+Dtj4YeLw3w8y8MBVco+t+W6WNza3tnfJuZW//4PDIPj7p6CRTDNosEYl6DKkGwSW0kaOAx1QBjUMB3XB8M/O7E1CaJ/IBpyn4MR1KHnFG0UiBbdf6CE8YRnlaBDkvaoFddevuHM468ZakSpZoBfZXf5CwLAaJTFCte56bop9ThZwJKCr9TENK2ZgOoWeopDFoP59fXjgXRhk4UaJMSXTm6u+JnMZaT+PQdMYUR3rVm4n/eb0Mo2s/5zLNECRbLIoy4WDizGJwBlwBQzE1hDLFza0OG1FFGZqwKiYEb/XlddJp1D237t03qs3bZRxlckbOySXxyBVpkjvSIm3CyIQ8k1fyZuXWi/VufSxaS9Zy5pT8gfX5A6Ctk6Q=</latexit>

qu<latexit sha1_base64="qwhDPG3m9RsyAvzl/9rCi7fsFiI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5L0oseCHjxWsB/QhrDZbtqlm03cnRRLyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgkRwDY7zbZU2Nre2d8q7lb39g8Mj+/iko+NUUdamsYhVLyCaCS5ZGzgI1ksUI1EgWDeY3Mz97pQpzWP5ALOEeREZSR5ySsBIvm3XBsCeIAizx9zP0rzm21Wn7iyA14lbkCoq0PLtr8EwpmnEJFBBtO67TgJeRhRwKlheGaSaJYROyIj1DZUkYtrLFpfn+MIoQxzGypQEvFB/T2Qk0noWBaYzIjDWq95c/M/rpxBeexmXSQpM0uWiMBUYYjyPAQ+5YhTEzBBCFTe3YjomilAwYVVMCO7qy+uk06i7Tt29b1Sbt0UcZXSGztElctEVaqI71EJtRNEUPaNX9GZl1ov1bn0sW0tWMXOK/sD6/AG0f5Ox</latexit><latexit sha1_base64="qwhDPG3m9RsyAvzl/9rCi7fsFiI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5L0oseCHjxWsB/QhrDZbtqlm03cnRRLyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgkRwDY7zbZU2Nre2d8q7lb39g8Mj+/iko+NUUdamsYhVLyCaCS5ZGzgI1ksUI1EgWDeY3Mz97pQpzWP5ALOEeREZSR5ySsBIvm3XBsCeIAizx9zP0rzm21Wn7iyA14lbkCoq0PLtr8EwpmnEJFBBtO67TgJeRhRwKlheGaSaJYROyIj1DZUkYtrLFpfn+MIoQxzGypQEvFB/T2Qk0noWBaYzIjDWq95c/M/rpxBeexmXSQpM0uWiMBUYYjyPAQ+5YhTEzBBCFTe3YjomilAwYVVMCO7qy+uk06i7Tt29b1Sbt0UcZXSGztElctEVaqI71EJtRNEUPaNX9GZl1ov1bn0sW0tWMXOK/sD6/AG0f5Ox</latexit><latexit sha1_base64="qwhDPG3m9RsyAvzl/9rCi7fsFiI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5L0oseCHjxWsB/QhrDZbtqlm03cnRRLyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgkRwDY7zbZU2Nre2d8q7lb39g8Mj+/iko+NUUdamsYhVLyCaCS5ZGzgI1ksUI1EgWDeY3Mz97pQpzWP5ALOEeREZSR5ySsBIvm3XBsCeIAizx9zP0rzm21Wn7iyA14lbkCoq0PLtr8EwpmnEJFBBtO67TgJeRhRwKlheGaSaJYROyIj1DZUkYtrLFpfn+MIoQxzGypQEvFB/T2Qk0noWBaYzIjDWq95c/M/rpxBeexmXSQpM0uWiMBUYYjyPAQ+5YhTEzBBCFTe3YjomilAwYVVMCO7qy+uk06i7Tt29b1Sbt0UcZXSGztElctEVaqI71EJtRNEUPaNX9GZl1ov1bn0sW0tWMXOK/sD6/AG0f5Ox</latexit><latexit sha1_base64="qwhDPG3m9RsyAvzl/9rCi7fsFiI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LLaCp5L0oseCHjxWsB/QhrDZbtqlm03cnRRLyD/x4kERr/4Tb/4bt20O2vpg4PHeDDPzgkRwDY7zbZU2Nre2d8q7lb39g8Mj+/iko+NUUdamsYhVLyCaCS5ZGzgI1ksUI1EgWDeY3Mz97pQpzWP5ALOEeREZSR5ySsBIvm3XBsCeIAizx9zP0rzm21Wn7iyA14lbkCoq0PLtr8EwpmnEJFBBtO67TgJeRhRwKlheGaSaJYROyIj1DZUkYtrLFpfn+MIoQxzGypQEvFB/T2Qk0noWBaYzIjDWq95c/M/rpxBeexmXSQpM0uWiMBUYYjyPAQ+5YhTEzBBCFTe3YjomilAwYVVMCO7qy+uk06i7Tt29b1Sbt0UcZXSGztElctEVaqI71EJtRNEUPaNX9GZl1ov1bn0sW0tWMXOK/sD6/AG0f5Ox</latexit>

Embeddings

pu1<latexit sha1_base64="mwiDX8Mt4UVtOysTDpl9LIPGLfw=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMut7j2mlb5fdqruAs068nJQhR6Nvf/UGMU0jJpEKonXXcxP0M6KQU8FmpV6qWULomAxZ11BJIqb9bHH5zLkwysAJY2VKorNQf09kJNJ6GgWmMyI40qveXPzP66YYXvsZl0mKTNLlojAVDsbOPAZnwBWjKKaGEKq4udWhI6IIRRNWyYTgrb68Tlq1qudWvftauX6bx1GEMziHS/DgCupwBw1oAoUJPMMrvFmZ9WK9Wx/L1oKVz5zCH1ifPxK9k0c=</latexit><latexit sha1_base64="mwiDX8Mt4UVtOysTDpl9LIPGLfw=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMut7j2mlb5fdqruAs068nJQhR6Nvf/UGMU0jJpEKonXXcxP0M6KQU8FmpV6qWULomAxZ11BJIqb9bHH5zLkwysAJY2VKorNQf09kJNJ6GgWmMyI40qveXPzP66YYXvsZl0mKTNLlojAVDsbOPAZnwBWjKKaGEKq4udWhI6IIRRNWyYTgrb68Tlq1qudWvftauX6bx1GEMziHS/DgCupwBw1oAoUJPMMrvFmZ9WK9Wx/L1oKVz5zCH1ifPxK9k0c=</latexit><latexit sha1_base64="mwiDX8Mt4UVtOysTDpl9LIPGLfw=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMut7j2mlb5fdqruAs068nJQhR6Nvf/UGMU0jJpEKonXXcxP0M6KQU8FmpV6qWULomAxZ11BJIqb9bHH5zLkwysAJY2VKorNQf09kJNJ6GgWmMyI40qveXPzP66YYXvsZl0mKTNLlojAVDsbOPAZnwBWjKKaGEKq4udWhI6IIRRNWyYTgrb68Tlq1qudWvftauX6bx1GEMziHS/DgCupwBw1oAoUJPMMrvFmZ9WK9Wx/L1oKVz5zCH1ifPxK9k0c=</latexit><latexit sha1_base64="mwiDX8Mt4UVtOysTDpl9LIPGLfw=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMut7j2mlb5fdqruAs068nJQhR6Nvf/UGMU0jJpEKonXXcxP0M6KQU8FmpV6qWULomAxZ11BJIqb9bHH5zLkwysAJY2VKorNQf09kJNJ6GgWmMyI40qveXPzP66YYXvsZl0mKTNLlojAVDsbOPAZnwBWjKKaGEKq4udWhI6IIRRNWyYTgrb68Tlq1qudWvftauX6bx1GEMziHS/DgCupwBw1oAoUJPMMrvFmZ9WK9Wx/L1oKVz5zCH1ifPxK9k0c=</latexit>

pu2<latexit sha1_base64="emobw2sXgMY56GNkKqKERqR0Kk0=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMuvXHtNK3y67VXcBZ514OSlDjkbf/uoNYppGTCIVROuu5yboZ0Qhp4LNSr1Us4TQMRmyrqGSREz72eLymXNhlIETxsqURGeh/p7ISKT1NApMZ0RwpFe9ufif100xvPYzLpMUmaTLRWEqHIydeQzOgCtGUUwNIVRxc6tDR0QRiiaskgnBW315nbRqVc+teve1cv02j6MIZ3AOl+DBFdThDhrQBAoTeIZXeLMy68V6tz6WrQUrnzmFP7A+fwAURJNI</latexit><latexit sha1_base64="emobw2sXgMY56GNkKqKERqR0Kk0=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMuvXHtNK3y67VXcBZ514OSlDjkbf/uoNYppGTCIVROuu5yboZ0Qhp4LNSr1Us4TQMRmyrqGSREz72eLymXNhlIETxsqURGeh/p7ISKT1NApMZ0RwpFe9ufif100xvPYzLpMUmaTLRWEqHIydeQzOgCtGUUwNIVRxc6tDR0QRiiaskgnBW315nbRqVc+teve1cv02j6MIZ3AOl+DBFdThDhrQBAoTeIZXeLMy68V6tz6WrQUrnzmFP7A+fwAURJNI</latexit><latexit sha1_base64="emobw2sXgMY56GNkKqKERqR0Kk0=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMuvXHtNK3y67VXcBZ514OSlDjkbf/uoNYppGTCIVROuu5yboZ0Qhp4LNSr1Us4TQMRmyrqGSREz72eLymXNhlIETxsqURGeh/p7ISKT1NApMZ0RwpFe9ufif100xvPYzLpMUmaTLRWEqHIydeQzOgCtGUUwNIVRxc6tDR0QRiiaskgnBW315nbRqVc+teve1cv02j6MIZ3AOl+DBFdThDhrQBAoTeIZXeLMy68V6tz6WrQUrnzmFP7A+fwAURJNI</latexit><latexit sha1_base64="emobw2sXgMY56GNkKqKERqR0Kk0=">AAAB+XicbVBNT8JAEJ3iF+JX1aOXRjDxRFoueiTRg0dM5COBSrbLFjZst83ulEga/okXDxrj1X/izX/jAj0o+JJJXt6bycy8IBFco+t+W4WNza3tneJuaW//4PDIPj5p6ThVlDVpLGLVCYhmgkvWRI6CdRLFSBQI1g7GN3O/PWFK81g+4DRhfkSGkoecEjRS37YrPWRPGIRZMuvXHtNK3y67VXcBZ514OSlDjkbf/uoNYppGTCIVROuu5yboZ0Qhp4LNSr1Us4TQMRmyrqGSREz72eLymXNhlIETxsqURGeh/p7ISKT1NApMZ0RwpFe9ufif100xvPYzLpMUmaTLRWEqHIydeQzOgCtGUUwNIVRxc6tDR0QRiiaskgnBW315nbRqVc+teve1cv02j6MIZ3AOl+DBFdThDhrQBAoTeIZXeLMy68V6tz6WrQUrnzmFP7A+fwAURJNI</latexit>

pu4<latexit sha1_base64="2sc5ng7FaIiwCqx8SasXTW94WP8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WWwFTyUpgh4LevBYwX5AG8tmu2mXbjZhdyItIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82PBNTjOt1XY2Nza3inulvb2Dw6P7ONyW0eJoqxFIxGprk80E1yyFnAQrBsrRkJfsI4/uZn7nSemNI/kA8xi5oVkJHnAKQEjDexytQ9sCn6Qxtkgvcwek+rArjg1ZwG8TtycVFCO5sD+6g8jmoRMAhVE657rxOClRAGngmWlfqJZTOiEjFjPUElCpr10cXuGz40yxEGkTEnAC/X3REpCrWehbzpDAmO96s3F/7xeAsG1l3IZJ8AkXS4KEoEhwvMg8JArRkHMDCFUcXMrpmOiCAUTV8mE4K6+vE7a9Zrr1Nz7eqVxm8dRRKfoDF0gF12hBrpDTdRCFE3RM3pFb1ZmvVjv1seytWDlMyfoD6zPH+c/lFY=</latexit><latexit sha1_base64="2sc5ng7FaIiwCqx8SasXTW94WP8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WWwFTyUpgh4LevBYwX5AG8tmu2mXbjZhdyItIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82PBNTjOt1XY2Nza3inulvb2Dw6P7ONyW0eJoqxFIxGprk80E1yyFnAQrBsrRkJfsI4/uZn7nSemNI/kA8xi5oVkJHnAKQEjDexytQ9sCn6Qxtkgvcwek+rArjg1ZwG8TtycVFCO5sD+6g8jmoRMAhVE657rxOClRAGngmWlfqJZTOiEjFjPUElCpr10cXuGz40yxEGkTEnAC/X3REpCrWehbzpDAmO96s3F/7xeAsG1l3IZJ8AkXS4KEoEhwvMg8JArRkHMDCFUcXMrpmOiCAUTV8mE4K6+vE7a9Zrr1Nz7eqVxm8dRRKfoDF0gF12hBrpDTdRCFE3RM3pFb1ZmvVjv1seytWDlMyfoD6zPH+c/lFY=</latexit><latexit sha1_base64="2sc5ng7FaIiwCqx8SasXTW94WP8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WWwFTyUpgh4LevBYwX5AG8tmu2mXbjZhdyItIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82PBNTjOt1XY2Nza3inulvb2Dw6P7ONyW0eJoqxFIxGprk80E1yyFnAQrBsrRkJfsI4/uZn7nSemNI/kA8xi5oVkJHnAKQEjDexytQ9sCn6Qxtkgvcwek+rArjg1ZwG8TtycVFCO5sD+6g8jmoRMAhVE657rxOClRAGngmWlfqJZTOiEjFjPUElCpr10cXuGz40yxEGkTEnAC/X3REpCrWehbzpDAmO96s3F/7xeAsG1l3IZJ8AkXS4KEoEhwvMg8JArRkHMDCFUcXMrpmOiCAUTV8mE4K6+vE7a9Zrr1Nz7eqVxm8dRRKfoDF0gF12hBrpDTdRCFE3RM3pFb1ZmvVjv1seytWDlMyfoD6zPH+c/lFY=</latexit><latexit sha1_base64="2sc5ng7FaIiwCqx8SasXTW94WP8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WWwFTyUpgh4LevBYwX5AG8tmu2mXbjZhdyItIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zM82PBNTjOt1XY2Nza3inulvb2Dw6P7ONyW0eJoqxFIxGprk80E1yyFnAQrBsrRkJfsI4/uZn7nSemNI/kA8xi5oVkJHnAKQEjDexytQ9sCn6Qxtkgvcwek+rArjg1ZwG8TtycVFCO5sD+6g8jmoRMAhVE657rxOClRAGngmWlfqJZTOiEjFjPUElCpr10cXuGz40yxEGkTEnAC/X3REpCrWehbzpDAmO96s3F/7xeAsG1l3IZJ8AkXS4KEoEhwvMg8JArRkHMDCFUcXMrpmOiCAUTV8mE4K6+vE7a9Zrr1Nz7eqVxm8dRRKfoDF0gF12hBrpDTdRCFE3RM3pFb1ZmvVjv1seytWDlMyfoD6zPH+c/lFY=</latexit>

ah<latexit sha1_base64="KiBo/41QesrkVx8pv/7aheEcxFQ=">AAAB7HicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36DjYUitv4gO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVWlX6OK4OyhW35i5A1omXkwrkaA7KX/1hzNKIK2SSGtPz3AT9jGoUTPJZqZ8anlA2oSPes1TRiBs/Wxw7IxdWGZIw1rYUkoX6eyKjkTHTKLCdEcWxWfXm4n9eL8Xw2s+ESlLkii0XhakkGJP552QoNGcop5ZQpoW9lbAx1ZShzadkQ/BWX14n7XrNc2vefb3SuM3jKMIZnMMleHAFDbiDJrSAgYBneIU3RzkvzrvzsWwtOPnMKfyB8/kD8yCOGw==</latexit><latexit sha1_base64="KiBo/41QesrkVx8pv/7aheEcxFQ=">AAAB7HicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36DjYUitv4gO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVWlX6OK4OyhW35i5A1omXkwrkaA7KX/1hzNKIK2SSGtPz3AT9jGoUTPJZqZ8anlA2oSPes1TRiBs/Wxw7IxdWGZIw1rYUkoX6eyKjkTHTKLCdEcWxWfXm4n9eL8Xw2s+ESlLkii0XhakkGJP552QoNGcop5ZQpoW9lbAx1ZShzadkQ/BWX14n7XrNc2vefb3SuM3jKMIZnMMleHAFDbiDJrSAgYBneIU3RzkvzrvzsWwtOPnMKfyB8/kD8yCOGw==</latexit><latexit sha1_base64="KiBo/41QesrkVx8pv/7aheEcxFQ=">AAAB7HicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36DjYUitv4gO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVWlX6OK4OyhW35i5A1omXkwrkaA7KX/1hzNKIK2SSGtPz3AT9jGoUTPJZqZ8anlA2oSPes1TRiBs/Wxw7IxdWGZIw1rYUkoX6eyKjkTHTKLCdEcWxWfXm4n9eL8Xw2s+ESlLkii0XhakkGJP552QoNGcop5ZQpoW9lbAx1ZShzadkQ/BWX14n7XrNc2vefb3SuM3jKMIZnMMleHAFDbiDJrSAgYBneIU3RzkvzrvzsWwtOPnMKfyB8/kD8yCOGw==</latexit><latexit sha1_base64="KiBo/41QesrkVx8pv/7aheEcxFQ=">AAAB7HicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2e8mSvb1jd04IR36DjYUitv4gO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikbeJUM95isYx1N6CGS6F4CwVK3k00p1EgeSeY3Mz9zhPXRsTqAacJ9yM6UiIUjKKVWlX6OK4OyhW35i5A1omXkwrkaA7KX/1hzNKIK2SSGtPz3AT9jGoUTPJZqZ8anlA2oSPes1TRiBs/Wxw7IxdWGZIw1rYUkoX6eyKjkTHTKLCdEcWxWfXm4n9eL8Xw2s+ESlLkii0XhakkGJP552QoNGcop5ZQpoW9lbAx1ZShzadkQ/BWX14n7XrNc2vefb3SuM3jKMIZnMMleHAFDbiDJrSAgYBneIU3RzkvzrvzsWwtOPnMKfyB8/kD8yCOGw==</latexit>

Eu<latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit>

10

Eu<latexit sha1_base64="qgxRuiMHa4tgBr8IGbAw5nfA5Do=">AAAB/nicbVBNS8NAFNzUr1q/ouLJy2IreCpJL3osqOCxgq2FJpaX7aZdutmE3Y1QQsG/4sWDIl79Hd78N27aHLR1YGGYeY83O0HCmdKO822VVlbX1jfKm5Wt7Z3dPXv/oKPiVBLaJjGPZTcARTkTtK2Z5rSbSApRwOl9ML7M/ftHKhWLxZ2eJNSPYChYyAhoI/Xto5o3Ap15EegRAZ5dT6cPaa1vV526MwNeJm5BqqhAq29/eYOYpBEVmnBQquc6ifYzkJoRTqcVL1U0ATKGIe0ZKiCiys9m8af41CgDHMbSPKHxTP29kUGk1CQKzGQeUy16ufif10t1eOFnTCSppoLMD4UpxzrGeRd4wCQlmk8MASKZyYrJCCQQbRqrmBLcxS8vk06j7jp197ZRbV4VdZTRMTpBZ8hF56iJblALtRFBGXpGr+jNerJerHfrYz5asoqdQ/QH1ucPLEuVnA==</latexit><latexit sha1_base64="qgxRuiMHa4tgBr8IGbAw5nfA5Do=">AAAB/nicbVBNS8NAFNzUr1q/ouLJy2IreCpJL3osqOCxgq2FJpaX7aZdutmE3Y1QQsG/4sWDIl79Hd78N27aHLR1YGGYeY83O0HCmdKO822VVlbX1jfKm5Wt7Z3dPXv/oKPiVBLaJjGPZTcARTkTtK2Z5rSbSApRwOl9ML7M/ftHKhWLxZ2eJNSPYChYyAhoI/Xto5o3Ap15EegRAZ5dT6cPaa1vV526MwNeJm5BqqhAq29/eYOYpBEVmnBQquc6ifYzkJoRTqcVL1U0ATKGIe0ZKiCiys9m8af41CgDHMbSPKHxTP29kUGk1CQKzGQeUy16ufif10t1eOFnTCSppoLMD4UpxzrGeRd4wCQlmk8MASKZyYrJCCQQbRqrmBLcxS8vk06j7jp197ZRbV4VdZTRMTpBZ8hF56iJblALtRFBGXpGr+jNerJerHfrYz5asoqdQ/QH1ucPLEuVnA==</latexit><latexit sha1_base64="qgxRuiMHa4tgBr8IGbAw5nfA5Do=">AAAB/nicbVBNS8NAFNzUr1q/ouLJy2IreCpJL3osqOCxgq2FJpaX7aZdutmE3Y1QQsG/4sWDIl79Hd78N27aHLR1YGGYeY83O0HCmdKO822VVlbX1jfKm5Wt7Z3dPXv/oKPiVBLaJjGPZTcARTkTtK2Z5rSbSApRwOl9ML7M/ftHKhWLxZ2eJNSPYChYyAhoI/Xto5o3Ap15EegRAZ5dT6cPaa1vV526MwNeJm5BqqhAq29/eYOYpBEVmnBQquc6ifYzkJoRTqcVL1U0ATKGIe0ZKiCiys9m8af41CgDHMbSPKHxTP29kUGk1CQKzGQeUy16ufif10t1eOFnTCSppoLMD4UpxzrGeRd4wCQlmk8MASKZyYrJCCQQbRqrmBLcxS8vk06j7jp197ZRbV4VdZTRMTpBZ8hF56iJblALtRFBGXpGr+jNerJerHfrYz5asoqdQ/QH1ucPLEuVnA==</latexit><latexit sha1_base64="qgxRuiMHa4tgBr8IGbAw5nfA5Do=">AAAB/nicbVBNS8NAFNzUr1q/ouLJy2IreCpJL3osqOCxgq2FJpaX7aZdutmE3Y1QQsG/4sWDIl79Hd78N27aHLR1YGGYeY83O0HCmdKO822VVlbX1jfKm5Wt7Z3dPXv/oKPiVBLaJjGPZTcARTkTtK2Z5rSbSApRwOl9ML7M/ftHKhWLxZ2eJNSPYChYyAhoI/Xto5o3Ap15EegRAZ5dT6cPaa1vV526MwNeJm5BqqhAq29/eYOYpBEVmnBQquc6ifYzkJoRTqcVL1U0ATKGIe0ZKiCiys9m8af41CgDHMbSPKHxTP29kUGk1CQKzGQeUy16ufif10t1eOFnTCSppoLMD4UpxzrGeRd4wCQlmk8MASKZyYrJCCQQbRqrmBLcxS8vk06j7jp197ZRbV4VdZTRMTpBZ8hF56iJblALtRFBGXpGr+jNerJerHfrYz5asoqdQ/QH1ucPLEuVnA==</latexit>

: eu2<latexit sha1_base64="rf0RYKuHV5wR/Xjt1iyKHIU4iso=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4qD+mNaHZQrbs1dgKwTLycVyNEclL/6w5ilEUrDBNW657mJ8TOqDGcCZ6V+qjGhbEJH2LNU0gi1ny3OnZELqwxJGCtb0pCF+nsio5HW0yiwnRE1Y73qzcX/vF5qwms/4zJJDUq2XBSmgpiYzH8nQ66QGTG1hDLF7a2EjamizNiESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AE13Y7R</latexit><latexit sha1_base64="rf0RYKuHV5wR/Xjt1iyKHIU4iso=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4qD+mNaHZQrbs1dgKwTLycVyNEclL/6w5ilEUrDBNW657mJ8TOqDGcCZ6V+qjGhbEJH2LNU0gi1ny3OnZELqwxJGCtb0pCF+nsio5HW0yiwnRE1Y73qzcX/vF5qwms/4zJJDUq2XBSmgpiYzH8nQ66QGTG1hDLF7a2EjamizNiESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AE13Y7R</latexit><latexit sha1_base64="rf0RYKuHV5wR/Xjt1iyKHIU4iso=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4qD+mNaHZQrbs1dgKwTLycVyNEclL/6w5ilEUrDBNW657mJ8TOqDGcCZ6V+qjGhbEJH2LNU0gi1ny3OnZELqwxJGCtb0pCF+nsio5HW0yiwnRE1Y73qzcX/vF5qwms/4zJJDUq2XBSmgpiYzH8nQ66QGTG1hDLF7a2EjamizNiESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AE13Y7R</latexit><latexit sha1_base64="rf0RYKuHV5wR/Xjt1iyKHIU4iso=">AAAB7nicbVA9SwNBEJ2LXzF+RS1tFhPBKtyl0TKghWUE8wHJGfY2c8mSvb1jd08IR36EjYUitv4eO/+Nm+QKTXww8Hhvhpl5QSK4Nq777RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7ASTm7nfeUKleSwfzDRBP6IjyUPOqLFSp4qD+mNaHZQrbs1dgKwTLycVyNEclL/6w5ilEUrDBNW657mJ8TOqDGcCZ6V+qjGhbEJH2LNU0gi1ny3OnZELqwxJGCtb0pCF+nsio5HW0yiwnRE1Y73qzcX/vF5qwms/4zJJDUq2XBSmgpiYzH8nQ66QGTG1hDLF7a2EjamizNiESjYEb/XlddKu1zy35t3XK43bPI4inME5XIIHV9CAO2hCCxhM4Ble4c1JnBfn3flYthacfOYU/sD5/AE13Y7R</latexit>

01 1al3

<latexit sha1_base64="YpZ5sH/N+tSwZrom/0/6k98gnFw=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcZKMDAWiX6gNlSO67RWbSeyHaQq6q9gYQAhVn4OG/8Gp80ALU866em9O93dC2LOtHHdb6ewtr6xuVXcLu3s7u0flA+P2jpKFKEtEvFIdQOsKWeStgwznHZjRbEIOO0Ek+vM7zxRpVkk7800pr7AI8lCRrCx0kMVD9KL2SOvDsoVt+bOgVaJl5MK5GgOyl/9YUQSQaUhHGvd89zY+ClWhhFOZ6V+ommMyQSPaM9SiQXVfjo/eIbOrDJEYaRsSYPm6u+JFAutpyKwnQKbsV72MvE/r5eY8MpPmYwTQyVZLAoTjkyEsu/RkClKDJ9agoli9lZExlhhYmxGJRuCt/zyKmnXa55b8+7qlcZNHkcRTuAUzsGDS2jALTShBQQEPMMrvDnKeXHenY9Fa8HJZ47hD5zPH+s9j9E=</latexit><latexit sha1_base64="YpZ5sH/N+tSwZrom/0/6k98gnFw=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcZKMDAWiX6gNlSO67RWbSeyHaQq6q9gYQAhVn4OG/8Gp80ALU866em9O93dC2LOtHHdb6ewtr6xuVXcLu3s7u0flA+P2jpKFKEtEvFIdQOsKWeStgwznHZjRbEIOO0Ek+vM7zxRpVkk7800pr7AI8lCRrCx0kMVD9KL2SOvDsoVt+bOgVaJl5MK5GgOyl/9YUQSQaUhHGvd89zY+ClWhhFOZ6V+ommMyQSPaM9SiQXVfjo/eIbOrDJEYaRsSYPm6u+JFAutpyKwnQKbsV72MvE/r5eY8MpPmYwTQyVZLAoTjkyEsu/RkClKDJ9agoli9lZExlhhYmxGJRuCt/zyKmnXa55b8+7qlcZNHkcRTuAUzsGDS2jALTShBQQEPMMrvDnKeXHenY9Fa8HJZ47hD5zPH+s9j9E=</latexit><latexit sha1_base64="YpZ5sH/N+tSwZrom/0/6k98gnFw=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcZKMDAWiX6gNlSO67RWbSeyHaQq6q9gYQAhVn4OG/8Gp80ALU866em9O93dC2LOtHHdb6ewtr6xuVXcLu3s7u0flA+P2jpKFKEtEvFIdQOsKWeStgwznHZjRbEIOO0Ek+vM7zxRpVkk7800pr7AI8lCRrCx0kMVD9KL2SOvDsoVt+bOgVaJl5MK5GgOyl/9YUQSQaUhHGvd89zY+ClWhhFOZ6V+ommMyQSPaM9SiQXVfjo/eIbOrDJEYaRsSYPm6u+JFAutpyKwnQKbsV72MvE/r5eY8MpPmYwTQyVZLAoTjkyEsu/RkClKDJ9agoli9lZExlhhYmxGJRuCt/zyKmnXa55b8+7qlcZNHkcRTuAUzsGDS2jALTShBQQEPMMrvDnKeXHenY9Fa8HJZ47hD5zPH+s9j9E=</latexit><latexit sha1_base64="YpZ5sH/N+tSwZrom/0/6k98gnFw=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcZKMDAWiX6gNlSO67RWbSeyHaQq6q9gYQAhVn4OG/8Gp80ALU866em9O93dC2LOtHHdb6ewtr6xuVXcLu3s7u0flA+P2jpKFKEtEvFIdQOsKWeStgwznHZjRbEIOO0Ek+vM7zxRpVkk7800pr7AI8lCRrCx0kMVD9KL2SOvDsoVt+bOgVaJl5MK5GgOyl/9YUQSQaUhHGvd89zY+ClWhhFOZ6V+ommMyQSPaM9SiQXVfjo/eIbOrDJEYaRsSYPm6u+JFAutpyKwnQKbsV72MvE/r5eY8MpPmYwTQyVZLAoTjkyEsu/RkClKDJ9agoli9lZExlhhYmxGJRuCt/zyKmnXa55b8+7qlcZNHkcRTuAUzsGDS2jALTShBQQEPMMrvDnKeXHenY9Fa8HJZ47hD5zPH+s9j9E=</latexit>

al4<latexit sha1_base64="+T5sy6yXBDO7hJiLazYdRgovNS0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbBD0W9OCxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4bs+0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbR0litAWiXikugHWlDNJW4YZTruxolgEnHaCyXXmd56o0iyS92YaU1/gkWQhI9hY6aGKB+nF7JFXB+WKW3PnQKvEy0kFcjQH5a/+MCKJoNIQjrXueW5s/BQrwwins1I/0TTGZIJHtGepxIJqP50fPENnVhmiMFK2pEFz9fdEioXWUxHYToHNWC97mfif10tMeOWnTMaJoZIsFoUJRyZC2fdoyBQlhk8twUQxeysiY6wwMTajkg3BW355lbTrNc+teXf1SuMmj6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nD+zFj9I=</latexit><latexit sha1_base64="+T5sy6yXBDO7hJiLazYdRgovNS0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbBD0W9OCxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4bs+0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbR0litAWiXikugHWlDNJW4YZTruxolgEnHaCyXXmd56o0iyS92YaU1/gkWQhI9hY6aGKB+nF7JFXB+WKW3PnQKvEy0kFcjQH5a/+MCKJoNIQjrXueW5s/BQrwwins1I/0TTGZIJHtGepxIJqP50fPENnVhmiMFK2pEFz9fdEioXWUxHYToHNWC97mfif10tMeOWnTMaJoZIsFoUJRyZC2fdoyBQlhk8twUQxeysiY6wwMTajkg3BW355lbTrNc+teXf1SuMmj6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nD+zFj9I=</latexit><latexit sha1_base64="+T5sy6yXBDO7hJiLazYdRgovNS0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbBD0W9OCxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4bs+0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbR0litAWiXikugHWlDNJW4YZTruxolgEnHaCyXXmd56o0iyS92YaU1/gkWQhI9hY6aGKB+nF7JFXB+WKW3PnQKvEy0kFcjQH5a/+MCKJoNIQjrXueW5s/BQrwwins1I/0TTGZIJHtGepxIJqP50fPENnVhmiMFK2pEFz9fdEioXWUxHYToHNWC97mfif10tMeOWnTMaJoZIsFoUJRyZC2fdoyBQlhk8twUQxeysiY6wwMTajkg3BW355lbTrNc+teXf1SuMmj6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nD+zFj9I=</latexit><latexit sha1_base64="+T5sy6yXBDO7hJiLazYdRgovNS0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CbaCp7JbBD0W9OCxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4bs+0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbR0litAWiXikugHWlDNJW4YZTruxolgEnHaCyXXmd56o0iyS92YaU1/gkWQhI9hY6aGKB+nF7JFXB+WKW3PnQKvEy0kFcjQH5a/+MCKJoNIQjrXueW5s/BQrwwins1I/0TTGZIJHtGepxIJqP50fPENnVhmiMFK2pEFz9fdEioXWUxHYToHNWC97mfif10tMeOWnTMaJoZIsFoUJRyZC2fdoyBQlhk8twUQxeysiY6wwMTajkg3BW355lbTrNc+teXf1SuMmj6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nD+zFj9I=</latexit>

1

eu3

<latexit sha1_base64="nV5IGVyTgpcvfsFwcmMZf4v9jtk=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTRwhIT+TBwkr1lgA27d5fdPRNy4VfYWGiMrT/Hzn/jAlco+JJJXt6bycy8IBZcG9f9dnJr6xubW/ntws7u3v5B8fCoqaNEMWywSESqHVCNgofYMNwIbMcKqQwEtoLx9cxvPaHSPArvzSRGX9JhyAecUWOlhzL20ovpY1LuFUtuxZ2DrBIvIyXIUO8Vv7r9iCUSQ8ME1brjubHxU6oMZwKnhW6iMaZsTIfYsTSkErWfzg+ekjOr9MkgUrZCQ+bq74mUSq0nMrCdkpqRXvZm4n9eJzGDKz/lYZwYDNli0SARxERk9j3pc4XMiIkllClubyVsRBVlxmZUsCF4yy+vkma14rkV765aqt1kceThBE7hHDy4hBrcQh0awEDCM7zCm6OcF+fd+Vi05pxs5hj+wPn8Af8Wj94=</latexit><latexit sha1_base64="nV5IGVyTgpcvfsFwcmMZf4v9jtk=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTRwhIT+TBwkr1lgA27d5fdPRNy4VfYWGiMrT/Hzn/jAlco+JJJXt6bycy8IBZcG9f9dnJr6xubW/ntws7u3v5B8fCoqaNEMWywSESqHVCNgofYMNwIbMcKqQwEtoLx9cxvPaHSPArvzSRGX9JhyAecUWOlhzL20ovpY1LuFUtuxZ2DrBIvIyXIUO8Vv7r9iCUSQ8ME1brjubHxU6oMZwKnhW6iMaZsTIfYsTSkErWfzg+ekjOr9MkgUrZCQ+bq74mUSq0nMrCdkpqRXvZm4n9eJzGDKz/lYZwYDNli0SARxERk9j3pc4XMiIkllClubyVsRBVlxmZUsCF4yy+vkma14rkV765aqt1kceThBE7hHDy4hBrcQh0awEDCM7zCm6OcF+fd+Vi05pxs5hj+wPn8Af8Wj94=</latexit><latexit sha1_base64="nV5IGVyTgpcvfsFwcmMZf4v9jtk=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTRwhIT+TBwkr1lgA27d5fdPRNy4VfYWGiMrT/Hzn/jAlco+JJJXt6bycy8IBZcG9f9dnJr6xubW/ntws7u3v5B8fCoqaNEMWywSESqHVCNgofYMNwIbMcKqQwEtoLx9cxvPaHSPArvzSRGX9JhyAecUWOlhzL20ovpY1LuFUtuxZ2DrBIvIyXIUO8Vv7r9iCUSQ8ME1brjubHxU6oMZwKnhW6iMaZsTIfYsTSkErWfzg+ekjOr9MkgUrZCQ+bq74mUSq0nMrCdkpqRXvZm4n9eJzGDKz/lYZwYDNli0SARxERk9j3pc4XMiIkllClubyVsRBVlxmZUsCF4yy+vkma14rkV765aqt1kceThBE7hHDy4hBrcQh0awEDCM7zCm6OcF+fd+Vi05pxs5hj+wPn8Af8Wj94=</latexit><latexit sha1_base64="nV5IGVyTgpcvfsFwcmMZf4v9jtk=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTRwhIT+TBwkr1lgA27d5fdPRNy4VfYWGiMrT/Hzn/jAlco+JJJXt6bycy8IBZcG9f9dnJr6xubW/ntws7u3v5B8fCoqaNEMWywSESqHVCNgofYMNwIbMcKqQwEtoLx9cxvPaHSPArvzSRGX9JhyAecUWOlhzL20ovpY1LuFUtuxZ2DrBIvIyXIUO8Vv7r9iCUSQ8ME1brjubHxU6oMZwKnhW6iMaZsTIfYsTSkErWfzg+ekjOr9MkgUrZCQ+bq74mUSq0nMrCdkpqRXvZm4n9eJzGDKz/lYZwYDNli0SARxERk9j3pc4XMiIkllClubyVsRBVlxmZUsCF4yy+vkma14rkV765aqt1kceThBE7hHDy4hBrcQh0awEDCM7zCm6OcF+fd+Vi05pxs5hj+wPn8Af8Wj94=</latexit>

Revised profile

Update high-level policy

Update low-level policy

Model

Target course

High-level action

Low-level actions

Eu<latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit><latexit sha1_base64="kL1N2eiX+FV+ykiTLUiearJ0X84=">AAAB+HicbVDLSsNAFL3xWeujUZduBlvBVUm60WVBBZcV7APaWCbTaTt0MgnzEGrol7hxoYhbP8Wdf+OkzUJbDwwczrmXe+aECWdKe963s7a+sbm1Xdgp7u7tH5Tcw6OWio0ktEliHstOiBXlTNCmZprTTiIpjkJO2+HkKvPbj1QqFot7PU1oEOGRYENGsLZS3y1VehHWY4J5ejN7MJW+W/aq3hxolfg5KUOORt/96g1iYiIqNOFYqa7vJTpIsdSMcDor9oyiCSYTPKJdSwWOqArSefAZOrPKAA1jaZ/QaK7+3khxpNQ0Cu1kllIte5n4n9c1engZpEwkRlNBFoeGhiMdo6wFNGCSEs2nlmAimc2KyBhLTLTtqmhL8Je/vEpatarvVf27Wrl+nddRgBM4hXPw4QLqcAsNaAIBA8/wCm/Ok/PivDsfi9E1J985hj9wPn8AP5iSzw==</latexit>

Original profile

Figure 3: The overall framework of the proposed model.

diluted by the irrelevant ones when users enrolled many di-verse courses. To deal with this issue, we propose a model torevise the user profiles by removing the noisy courses fromthe history, and recommend courses based on the revisedprofiles. The key challenge is how to determine which his-torical courses are the noises without direct supervision, i.e.,identify the courses that disturb the recommendation perfor-mance. Thus, we propose a hierarchical reinforcement learn-ing algorithm to solve it. Specifically, we formalize the re-vising process of a user profile to be a hierarchical sequentialdecision process by an agent. Following a revising policy, ahigh-level and a low-level task are performed to revise theprofile. After the whole profile of a user is revised, the agentgets a delayed reward from the environment, based on whichit updates its policy. The environment can be viewed as thedataset and a pre-trained basic recommendation model as in-troduced in the previous section. After the policy is updated,the basic recommendation model is re-trained based on theprofiles revised by the agent. Essentially, the profile reviserand the recommendation model are jointly trained. Figure 3illustrates the framework of the proposed model.

Profile ReviserAs mentioned before, the profile reviser aims to remove thenoisy courses with few contributions in a prediction. In-spired by the theory of hierarchical abstract machines (Parrand Russell 1998), we cast the task of profile reviser asa hierarchical Markov Decision Process (MDP). Generallyspeaking, we decompose the overall task MDP M into twokinds of subtasks Mh and M l, where Mh is the high-levelabstract task in the hierarchy and solving it solves the entireMDP M , and M l is the low-level primitive task in the hier-archy. Each kind of task is defined as a 4-tuple MDP (S, A,T , R), where S is a set of states, A is a set of actions, T isa transition model mapping S × A× S into probabilities in[0,1], and R is a reward function mapping S × A × S intoreal-valued rewards. We formulate our task by a high-leveltask and a low-level task. Specifically, given a sequence ofhistorical courses Eu := (eu1 , · · · , eutu) of user u and the tar-get course ci, the agent performs a high-level task of onebinary action to determine whether to revise the whole pro-file Eu or not. If it decides to revise Eu, the agent performsa low-level task of multiple actions to determine whetherto remove each historical course eut ∈ Eu or not. After thelow-level task is finished, the overall task is finished. If thehigh-level task decides to make no revision, low-level task

Page 4: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

will not be executed and the overall task is directly finished.We formulate the profile reviser as two-level MDPs, be-

cause a part of the user profiles are discriminative and canbe already correctly predicted by the basic recommenda-tion model. As presented in Figure 2(b), about 30% usersare relatively attentive (i.e., #Categories/#Courses<0.4) andthe corresponding average probabilities of recommendingthe real target courses are high (i.e., larger than 0.6). Wecan simply keep those profiles as the original ones and onlyrevise the indiscriminative ones. Out of this consideration,we design a high-level task to decide whether to revise thewhole profile of a user or not, and a low-level task to decidewhich course in the profile should be removed.

Note that for both the high-level and low-level task, givena state and an action, it will transit to a determined state withprobability 1. We will introduce the details of how to designthe state, action and reward for the two-level tasks.State. The high-level task takes an action according to thestate of the whole profile Eu and the low-level task takesa sequence of actions according to the state of each courseeut ∈ Eu. We define different state features for the two tasks.

• Low-level task: When determining to remove a histori-cal course eut ∈ Eu, we define the state features slt as thecosine similarity between the embedding vectors of thecurrent historical course eut and the target course ci, theelement-wise product between them, and also the averageof the two previous features over all the reserved histor-ical courses, where the embedding vector of a course pican be provided by a pre-trained basic recommendationmodel. We also define the effort taken in a course as anadditional state feature, as the effort taken in a course issignificantly different (Cf. Figure 2(d)) and it may alsoindicate the contribution of eut to ci besides the similarity-based features. For simplicity and clarity, we ignore thesuperscript u in all the notations about the state features.

• High-level task: When determining to revise a whole pro-file Eu, we define the state features sh as the average co-sine similarity between the embedding vectors of eachhistorical course in Eu and the target course and the av-erage element-wise product between them. We also de-fine an additional state feature as the probability P (y =1|Eu, ci) of recommending ci to user u by a basic recom-mendation model. The probability reflects how crediblethe course ci will be recommended based on the profileEu. The lower recommendation probability is, more effortshould be taken to revise Eu. Note we train the profile re-viser only based on the positive instances, i.e., a user pro-file paired with a real target course, as negative instanceswith random target courses can hardly guide the agent toselect the contributing courses to the target course. ThusP (y = 0|Eu, ci) for a negative instance is not calculated.

Action and Policy. We define the high-level action ah ∈{0, 1} as a binary value to represent whether to revise thewhole profile of a user or not, and define a low-level actionalt ∈ {0, 1} as a binary value to represent whether to removethe historical course eut or not. We perform a low-level actionalt according to the policy function as follows:

Hlt = ReLU(Wl

1slt + bl), (3)

π(slt, alt) = P (alt|slt,Θl)

= altσ(Wl2Hl

t) + (1− alt)(1− σ(Wl2Hl

t)),

where Wl1 ∈ Rdl1×dl2 , Wl

2 ∈ Rdl2×1 and bl ∈ Rdl2 are theparameters to be learned with dl1 as the number of the statefeatures and dl2 as the dimension of the hidden layer. No-tation Hl

t represents the embedding of the input state. Wedenote Θl = {Wl

1,Wl2,b

l}. Sigmoid function σ is used totransform the input into a probability. The high-level actionis performed according to the similar policy function withdifferent parameters Θh = {Wh

1 ,Wh2 ,b

h}.Reward. The reward is a signal to indicate whether the per-formed actions are reasonable or not. We assume that ev-ery low-level action in the low-level task has a delayed re-ward after the last action altu is performed for the last courseeutu ∈ Eu. In another word, the immediate reward for a low-level action is zero except the last low-level action. Thus, wedefine the reward for each low-level action as:

R(alt, slt) =

{log p(Eu, ci)− log p(Eu, ci). if t = tu;

0 otherwise,

where p(Eu, ci) is an abbreviation of p(y = 1|Eu, ci) and Euis the revised profile, which is a subset of Eu. For the specialcase Eu = ∅, i.e., all the historical courses are removed, werandomly select a course from the original set Eu. The re-ward is defined as the difference between the log-likelihoodafter and before the profile is revised. A positive differenceindicates a positive utility gained by the revised profile.

If the high-level task chooses the revising action, it callsthe low-level task and receives the same delayed rewardR(alt, slt) after the last low-level action is performed. Oth-erwise, it keeps the original profile and obtain a zero rewardas log p(Eu, ci) is not changed.

In addition, we define an internal reward G(alt, slt) whichis used only inside the low-level task to speed up itslocal learning and does not propagate to the high-leveltask (Ghavamzadeh and Mahadevan 2003). Specifically, wefirst calculate the average cosine similarity between eachhistorical course and the target course after and before theprofile is revised, and then use the difference between themas the internal reward G(alt, slt). The internal reward encour-ages the agent to select the most relevant courses to the tar-get course. Finally, we sum G(alt, slt) and R(alt, slt) as thereward for the low-level task.Objective Function. We aim at finding the optimal param-eters of the policy function defined in Eq. (3) to maximizethe expected reward, i.e.,

Θ∗ = argmaxΘ

∑τ

PΘ(τ ; Θ)R(τ), (4)

where Θ represents either Θh or Θl, τ is a sequence of thesampled actions and the transited states, PΘ(τ ; Θ) denotesthe corresponding sampling probability and R(τ) is the re-ward for the sampled sequence τ . The sampled sequence

Page 5: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

Pre-train the basic recommendation model;Pre-train the profiler reviser by running Algorithm 2 with the

basic recommendation model fixed;Jointly train the two models together by running Algorithm 2;

Algorithm 1: The Overall Training Process

Input: Training data {E1, E2, · · · , E |U|}, a pre-trained basicrecommendation model and a profile reviserparameterized by Φ0 and Θ0 respectively

Initialize Θ = Θ0, Φ = Φ0 ;for episode l=1 to L do

foreach Eu := (eu1 , · · · , eutu) and ci doSample a high-level action ah with Θh;if ah = 0 then

R(sh, ah) = 0else

Sample a sequence of low-level actions{al1, al2, · · · , altu} with Θl;

Compute R(altu , sltu) and G(altu , s

ltu) ;

Compute gradients by Eq. (5) and (6);end

endUpdate Θ by the gradients;Update Φ in the basic recommendation model;

endAlgorithm 2: The Hierarchical Reinforcement Learning

τ can be {sl1, al1, sl2, · · · , slt, alt, slt+1, · · · } for the low-leveltask and {sh, ah} for the high-level task. Since there aretoo many possible action-state trajectories for the entire se-quences of the two tasks, we adopt the policy gradient the-orem (Sutton et al. 2000) and the monto-carlo based policygradient method (Williams 1992) to sample M action-statetrajectories, based on which we calculate the gradient of theparameters for the low-level policy function:

∇Θ =1

m

M∑m=1

tu∑t=1

∇Θ log πΘ(smt , amt )(R(amt , s

mt )+G(amt , s

mt )),

(5)

where the reward R(amt , smt ) + G(amt , smt ) for each action-state pair in sequence τ (m) is assigned the same value andequals to the terminal rewardR(amtu , s

mtu)+G(amtu , s

mtu). The

gradient for the high-level policy function:

∇Θ =1

m

M∑m=1

∇Θ log πΘ(sm, am)R(amt , smt ), (6)

where the reward R(am, sm) is assigned as R(amtu , smtu)

when am = 1, and 0 otherwise. We omit the superscript land h in Eq. (6) and (5) for simplicity.

Model TrainingThe two models of the profile reviser and the basic recom-mendation model are interleaved together, and we need to

train them jointly. The training process is shown in Algo-rithm 1, where we firstly pre-train the basic recommenda-tion model based on the original dataset, then we fix the pa-rameters of the basic recommendation model and pre-trainthe profile reviser to automatically revise the user profiles;finally, we jointly train the models together. Same as the set-tings of (Feng et al. 2018), to have a stable update, each pa-rameter is updated by a linear combination of its old versionand the new old version, i.e., Θnew = λΘnew+(1−λ)Θold,where λ� 1. The time complexity isO(L(NtuM)), whereL is the number of epochs, N is the number of instances, tuis the the average number of historical courses and M is theMonto Carlo sampling time.

ExperimentsExperimental SettingsSettings. The dataset is introduced in the section of MOOCdata. We select the enrolled behaviors from October 1st,2016 to December 30th, 2017 as the training set, and thosefrom January 1st, 2018 to March 31st, 2018 as the test set.Each instance in the training or the test set is a sequence ofhistorical enrolled courses paired with a target course. Dur-ing the training process, for each sequence in the trainingdata, we hold out the last course as the target course, and therest are treated as the historical courses. For each positiveinstance, we construct 4 negative instances by replacing thetarget course with each of 4 randomly sampled courses. Dur-ing the test process, we treat each enrolled course in the testset as the target course, and the corresponding courses of thesame user in the training set as the historical courses. Eachpositive instance in the test set is paired with 99 randomlysampled negative instances (He et al. 2018).Baseline Methods. The comparison methods include:

BPR (Rendle et al. 2009): optimizes a pairwise rankingloss for the recommendation task in a Bayesian way.

MLP (He et al. 2017): applies a multi-layer perceptron(MLP) on a pair of user and course embeddings to learn theprobability of recommending the course to the user.

FM (Rendle 2012): is a principled approach that can eas-ily incorporate any heuristic features. But for fair compari-son, we only use the embeddings of users and courses.

FISM (Kabbur, Ning, and Karypis 2013): is an item-to-item collaborative filtering algorithm which conducts rec-ommendation based on the average embedding of all the his-torical courses and the embedding of the target course.

NAIS (He et al. 2018): is also an item-to-item collab-orative filtering algorithm but distinguishes the weights ofdifferent historical courses by an attention mechanism.

GRU (Hidasi et al. 2016): is a gated recurrent unit modelthat receives a sequence of historical courses as input andoutput the last hidden vector as the representation of a user’spreference.

NASR (Li et al. 2017): is an improved GRU model thatestimates an attention coefficient for each historical coursebased on the corresponding hidden vector output by GRU.

HRL+NAIS: is the proposed model that adopts NAIS asthe basic recommendation model and we jointly train it with

Page 6: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

Table 1: Recommendation performance (%).Methods HR@5 HR@10 NDCG@5 NDCG@10BPR 46.82 60.73 34.16 38.65MLP 52.16 66.29 40.39 44.41FM 46.01 61.07 35.28 40.15FISM 52.73 65.64 40.00 44.98GRU 52.07 68.63 38.92 46.30NAIS 56.42 69.05 43.73 47.82NASR 54.64 69.48 42.39 47.33HRL+NAIS 64.59 79.68 45.74 50.69HRL+NASR 59.05 74.50 47.51 52.73

the hierarchical reinforcement learning (HRL) based profilereviser.

HRL+NASR: is also the proposed model but adoptsNASR as the basic recommendation model.Evaluation Metrics. We evaluate all the methods in termsof the widely used metrics Hit Ratio of top K items(HR@K) and Normalized Discounted Cumulative Gain oftop K items (NDCG@K), where HR@K is a recall-basedmetric that measures the percentage of the ground truth in-stances that are successfully recommended in top-K, andNDCG@K is a precision-based metrics that accounts for thepredicted position of the ground truth instance (Huang et al.2018; He et al. 2018; Rendle, Freudenthaler, and Schmidt-Thieme 2010). We set K as 5 and 10 and calculate all themetrics for every 100 instances (1 positive plus 99 negatives)and report the average score of all the users.Implementaion Details. We implement the model by Ten-sorflow and run the code on an Enterprise Linux Server with40 Intel(R) Xeon(R) CPU cores (E5-2630 and 512G mem-ory) and 1 NVIDIA TITAN V GPU core (12G memory).For the profile reviser, sampling time M is set as 3, thelearning rate is set as 0.001/0.0005 at the pre-training andjoint-training stage respectively. In the policy function, thedimensions of the hidden layer dl2 and dh2 are both set as8. For the basic recommender, the dimension of the courseembeddings is set to 16, the learning rate is 0.01 at both thepre-training and joint-training stage, and the size of the mini-batch is 256. The delayed coefficient λ for the joint-trainingis 0.0005. The code is online now3.

Performance AnalysisOverall Prediction Performance. Table 1 shows the over-all performance of all the comparison methods. The pro-posed model performs clearly better than the comparisonbaselines (improving 5.02% to 18.95% in HR@10). Theuser-to-item based collaborative filtering methods such asBPR, MLP and FM perform the worst among all the meth-ods, because in our dataset, most of the users only enrolleda few courses (i.e., less than 10 courses as shown in Fig-ure 2(a)). Thus the embeddings for many users can not besufficiently inferred from the sparse data. Among all theitem-to-item based collaborative filtering methods, FISMand GRU perform worse than the others, as they make equaltreatments on all the historical enrolled courses and thus the

3https://github.com/jerryhao66/HRL

HR@5 HR@10 NDCG@5 NDCG@1040

50

60

70

80

%

~ 4.29

RL+NAISHRL+NAIS

(a) RL+NAIS (b) Greedy+NAIS

Figure 4: Recommendation performance of model variants.

preference representation ability is limited. NAIS and NASRdistinguish the effects of different historical courses by as-signing them different attention coefficients. However, theuseless courses will dilute the effects of the useful courses inthe history when users enrolled many diverse courses. Theproposed methods, HRL+NAIS and HRL+NASR performthe best, as they forcely remove the noisy courses instead ofassigning soft attention coefficients, which distinguish theuseful and useless courses significantly.

For the proposed methods, processing 1 episode of profileupdate requires 50-80 seconds and the recommender updaterequires 20-30 seconds. The best recommendation perfor-mance on test set is reached after about 20 episodes of rec-ommender pre-training, 20 episodes of profile reviser pre-training and 5 episodes of joint training through the data,which totally requires 30-45 minutes of joint training.Compared with One-level RL. We compare the proposedHRL with an one-level RL algorithm, which only uses thelow-level task to directly decide to remove each course ornot. The comparison results in Figure 4(a) show that HRLoutperforms the one-level RL. We find that for HRL, the av-erage #Categories/#Courses of the revised profiles is 0.73and that for the one-level RL is 0.75, which indicates that therevised profiles by the proposed HRL are more consistent(The larger the value is, the more diverse a profile is). Thisis because HRL uses an additional high-level task to decideto keep the consistent profiles and revise the diverse profiles.To verify whether the high-level task takes effect or not, wefurther check the difference between the kept profiles andthe revised profiles decided by the high-level task. The av-erage #Categories/#Courses of the kept profiles is 0.57, andthat of the revised profiles is 0.69, which indicates that thehigh-level task tends to keep more consistent profiles whilerevise more diverse profiles.Compared with Greedy Revision. We compare the pro-posed HRL with a greedy revision algorithm, which firstlydecides to revise the whole profile Eu if logP (y =1|Eu, ci) < µ1, and further removes a eut ∈ Eu if its co-sine similarity with ci is less than µ2. In Figure 4(b), wetune µ1 from -2.5 to 0 with interval 0.5, and tune µ2 from-0.1 to 0.1 with interval 0.04, and obtain the best HR@10 as77.44% when µ1=-1 and µ2=0.1, which is 2.27% less thanHRL+NAIS. Note the best performance is obtained whenthe number of the remaining courses is almost the same asthose by HRL+NAIS.

Page 7: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

Table 2: Case studies of the profiles revised by HRL+NAIS and the attention coefficients learned by NAIS.Methods Revised profile or the learned attentions The target course

HRL+NAIS Crisis Negotiation, Social Civilization, Web Technology, C++ Program Web DevelopmentNAIS Crisis Negotiation(29.61), Social Civilization(29.09), Web Technology(28.32), C++ Program(28.12) Web Development

HRL+NAIS Modern Biology, Medical Mystery, Biomedical Imaging, R Program BiologyNAIS Modern Biology(37.79), Medical Mystery(37.96), Biomedical Imaging(37.62), R Program(37.84) Biology

HRL+NAIS Web Technology, Art Classics, National Unity Theory, Philosophy Life AestheticsNAIS Web Technology(38.32), Art Classics(35.87), National Unity Theory(40.63), Philosophy(43.69) Life Aesthetics

Compared with Attentions Coefficients. We presentseveral cases of the revised profiles by the proposedHRL+NAIS and show three cases and the correspondinglearned attention coefficients by NAIS in Table 2. The casespresent that HRL+NAIS can definitely remove the noisycourses in the profile that are totally irrelevant to the tar-get course. In contrary, although NAIS assigns high atten-tions to the contributing historical courses, the attentions ofsome other irrelevant courses are not significantly different,or even higher than the relevant ones, thus the effects of thereal contributing courses are discounted after aggregating allthe historical courses by their attentions. As a result, the per-formance of the recommendation model based on the dis-criminative revised profiles is improved.

Related WorkCollaborative filtering (CF) is widely used to do recom-mendation. User-to-item based CF, such as matrix factor-ization (Koren, Bell, and Volinsky 2009), bayesian person-alized ranking (BPR) (Rendle et al. 2009) and factoriza-tion machine (FM) (Rendle 2012) performs recommenda-tion based on both the user and item embeddings. Theseshallow models are further extended to deep neural networkmodels (He et al. 2017; Guo et al. 2017; Zhang, Du, andWang 2016). The user-to-item CF suffers from the spar-sity of users’ profiles. On the contrary, the item-to-item CFdoes not need to estimate user embeddings, and is heav-ily adopted in industrial applications (Davidson et al. 2010;Smith and Linden 2017). Early item-to-item based CF usesheuristic metrics such as Pearson coefficient or cosine simi-larity to estimate item similarities (Sarwar et al. 2001), fol-lowed by a machine learning method which calculates theitem similarity as the dot product of item embeddings (Kab-bur, Ning, and Karypis 2013). Sequential based models suchas RNN (Tan, Xu, and Liu 2016) and GRU (Hidasi et al.2016) are proposed to capture the temporal factor. Thenthe attention-based models such as NAIS (He et al. 2018)and NASR (Li et al. 2017) are further proposed to dis-tinguish the effects of different items. Several researchesare conducted on MOOCs platforms, such as learning be-havior analysis (Anderson et al. 2014; Qiu et al. 2016;Qi et al. 2018) and course recommendation (Jing and Tang2017). They focus on extracting features from multi-modedata sources besides the enrolled behaviors of users, thus itis unfair to compare with their methods.

Recently, some researchers attempt to adopt the reinforce-ment learning algorithm to solve many kinds of problems,such as relation classification (Feng et al. 2018), text clas-

sification (Zhang, Huang, and Zhao 2018), information ex-traction (Narasimhan, Yala, and Barzilay 2016), question an-swering (Wang et al. 2018b) and treatment recommenda-tion (Wang et al. 2018a). Inspired by these successful at-tempts, we propose a hierarchal reinforcement learning al-gorithm to conduct course recommendation. Hierarchical re-inforcement learning aims at decomposing complex tasksinto multiple small tasks to reduce the complexity of de-cision making (Barto and Mahadevan 2003), where differ-ent HRLs such as option-based HRL that formulates the ab-stract knowledge and action as options (Sutton, Precup, andSingh 1999) and the hierarchical abstract machines (HAMs)that decomposes high-level activities into low-level activi-ties (Sutton, Precup, and Singh 1999) are proposed. We for-malize our problem by the theory of HAMs.

Conclusion

We present the first attempt to solve the problem of courserecommendation in MOOCs platform by a hierarchical rein-forcement learning model. The model jointly trains a profilereviser and a basic recommendation model, which enablesthe recommendation model being trained on user profilesrevised by the profile reviser. With the designed two-leveltasks, the agent in the hierarchical reinforcement learningmodel can effectively remove the noisy courses and reservethe real contributing courses to the target course.

We will try the proposed model in other domains. For ex-ample, people usually watch diverse movies, read diversebooks and purchase diverse products. In those scenarios, wecan imagine the need for selecting the most contributing his-torical items from users’ diverse profiles, which poses thesame challenges with the recommendations in MOOCs. Inthe future, we will also explore how to connect the coursesin MOOCs to the external entities or knowledge such as theacademic papers and researchers (Tang et al. 2008) to enablemore accurate course recommendation in MOOCs.

ACKNOWLEDGMENTS

This work is supported by National Key R&D Program ofChina (No.2018YFB1004401) and NSFC under the grantNo. 61532021, 61772537, 61772536, 61702522, the Re-search Funds of Renmin University of China (15XNLQ06)and the Research Funds of Online Education (2017ZD205).

*Cuiping Li is the corresponding author.

Page 8: Hierarchical Reinforcement Learning ... - xiaojingzi.github.io · Hierarchical Reinforcement Learning for Course Recommendation in MOOCs Jing Zhang1,2, Bowen Hao1,2, Bo Chen1,2, Cuiping

ReferencesAnderson, A.; Huttenlocher, D.; Kleinberg, J.; andLeskovec, J. 2014. Engaging with massive online courses.In WWW’14, 687–698.Barto, A. G., and Mahadevan, S. 2003. Recent advances inhierarchical reinforcement learning. Discrete event dynamicsystems 13(1-2):41–77.Davidson, J.; Liebald, B.; Liu, J.; Nandy, P.; Van Vleet, T.;Gargi, U.; Gupta, S.; He, Y.; Lambert, M.; Livingston, B.;et al. 2010. The youtube video recommendation system. InRecommender Systems’10, 293–296.Feng, J.; Huang, M.; Zhao, L.; Yang, Y.; and Zhu, X. 2018.Reinforcement learning for relation classification from noisydata. In AAAI’18, 5779–6786.Ghavamzadeh, M., and Mahadevan, S. 2003. Hierarchicalpolicy gradient algorithms. Computer Science DepartmentFaculty Publication Series 173.Guo, H.; Tang, R.; Ye, Y.; Li, Z.; and He, X. 2017. Deepfm:a factorization-machine based neural network for ctr predic-tion. In IJCAI’17, 1725–1731.He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; and Chua, T.-S.2017. Neural collaborative filtering. In WWW’17, 173–182.He, X.; He, Z.; Song, J.; Liu, Z.; Jiang, Y.-G.; and Chua,T.-S. 2018. Nais: Neural attentive item similarity modelfor recommendation. IEEE Transactions on Knowledge andData Engineering.Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; and Tikk, D.2016. Session-based recommendations with recurrent neuralnetworks. In ICLR’16.Huang, J.; Zhao, W. X.; Dou, H.; Wen, J.-R.; and Chang,E. Y. 2018. Improving sequential recommendation withknowledge-enhanced memory networks. In SIGIR’18, 505–514.Jing, X., and Tang, J. 2017. Guess you like: course recom-mendation in moocs. In WI’17, 783–789.Kabbur, S.; Ning, X.; and Karypis, G. 2013. Fism: factoreditem similarity models for top-n recommender systems. InSIGKDD’13, 659–667.Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix fac-torization techniques for recommender systems. Computer(8):30–37.Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; and Ma, J.2017. Neural attentive session-based recommendation. InCIKM’17, 1419–1428.Narasimhan, K.; Yala, A.; and Barzilay, R. 2016. Improvinginformation extraction by acquiring external evidence withreinforcement learning. In EMNLP’16.Parr, R., and Russell, S. J. 1998. Reinforcement learningwith hierarchies of machines. In NIPS’98, 1043–1049.Qi, Y.; Wu, Q.; Wang, H.; and Sun, M. 2018. Bandit learningwith implicit feedback. In NIPS’18.Qiu, J.; Tang, J.; Liu, T. X.; Gong, J.; Zhang, C.; Zhang,Q.; and Xue, Y. 2016. Modeling and predicting learningbehavior in moocs. In WSDM’16, 93–102.

Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt-Thieme, L. 2009. Bpr: Bayesian personalized ranking fromimplicit feedback. In UAI’09, 452–461.Rendle, S.; Freudenthaler, C.; and Schmidt-Thieme, L.2010. Factorizing personalized markov chains for next-basket recommendation. In WWW’10, 811–820.Rendle, S. 2012. Factorization machines with libfm. ACMTransactions on Intelligent Systems and Technology 3(3):57.Sarwar, B.; Karypis, G.; Konstan, J.; and Riedl, J. 2001.Item-based collaborative filtering recommendation algo-rithms. In WWW’01, 285–295.Smith, B., and Linden, G. 2017. Two decades of recom-mender systems at amazon. com. IEEE Internet Computing21(3):12–18.Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour,Y. 2000. Policy gradient methods for reinforcement learningwith function approximation. In NIPS’00, 1057–1063.Sutton, R. S.; Precup, D.; and Singh, S. 1999. Between mdpsand semi-mdps: A framework for temporal abstraction in re-inforcement learning. Artificial Intelligence 112(1-2):181–211.Tan, Y. K.; Xu, X.; and Liu, Y. 2016. Improved recur-rent neural networks for session-based recommendations. InProceedings of the 1st Workshop on Deep Learning for Rec-ommender Systems, 17–22.Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z.2008. Arnetminer: extraction and mining of academic socialnetworks. In SIGKDD’08, 990–998.Wang, L.; Zhang, W.; He, X.; and Zha, H. 2018a. Super-vised reinforcement learning with recurrent neural networkfor dynamic treatment recommendation. In SIGKDD ’18,2447–2456.Wang, Z.; Liu, J.; Xiao, X.; Lyu, Y.; and Wu, T. 2018b.Joint training of candidate extraction and answer selectionfor reading comprehension. In ACL’18, 1715–1724.Williams, R. J. 1992. Simple statistical gradient-followingalgorithms for connectionist reinforcement learning. Ma-chine Learning 8(3-4):229–256.Zhang, W.; Du, T.; and Wang, J. 2016. Deep learning overmulti-field categorical data. In ECIR’16, 45–57.Zhang, T.; Huang, M.; and Zhao, L. 2018. Learning struc-tured representation for text classification via reinforcementlearning. In AAAI’18, 6053–6060.