34
111 Graph Neural Networks in Recommender Systems: A Survey SHIWEN WU, Peking University FEI SUN, Alibaba Group WENTAO ZHANG, Peking University BIN CUI, Peking University With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recent years, graph neural network (GNN) techniques which can naturally integrate node information and topological structure have gained considerable interests. Owing to their outstanding performance in graph data learning, GNN techniques have been widely applied in many fields. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Since most of the information essentially has graph structure and GNN has superiority in representation learning, the field of utilizing GNN in recommender systems is flourishing. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models and state new perspectives pertaining to the development of this field. CCS Concepts: • Information Systems Recommender systems. Additional Key Words and Phrases: Recommender System; Graph Neural Network; Survey ACM Reference Format: Shiwen Wu, Fei Sun, Wentao Zhang, and Bin Cui. 2021. Graph Neural Networks in Recommender Systems: A Survey. J. ACM 37, 4, Article 111 (April 2021), 34 pages. https://doi.org/10.1145/1122445.1122456 1 INTRODUCTION With the rapid development of e-commerce and social media platforms, recommender systems have become indispensable tools for many businesses [13, 145, 153]. They can be recognized as various forms depending on industries, like product suggestions on online e-commerce websites (e.g., Amazon and Taobao) or playlist generators for video and music services (e.g., YouTube, Netflix, and Spotify). Users rely on recommender systems to alleviate the information overload problem and explore what they are interested in from the vast sea of items (e.g., products, movies, news, or restaurants). To achieve this goal, accurately modeling users’ preferences from their historical interactions (e.g., click, watch, read, and purchase) lives at the heart of an effective recommender system. Broadly speaking, in the past decades, the mainstream modeling paradigm in recommender systems has evolved from neighborhood methods [3, 35, 64, 89] to representation learning based framework [13, 50, 51, 90, 110]. Item-based neighborhood methods [3, 64, 89] directly recommend items to users that are similar to the historical items they have interacted with. In a sense, they Author’s addresses: S. Wu, W. Zhang, B. Cui, Peking University; emails: [email protected]; [email protected]; [email protected]; F. Sun, Alibaba Inc.; [email protected]; Wentao Zhang and Bin Cui are the corresponding authors. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 0004-5411/2021/4-ART111 $15.00 https://doi.org/10.1145/1122445.1122456 J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021. arXiv:2011.02260v2 [cs.IR] 19 Apr 2021

SHIWEN WU, FEI SUN, WENTAO ZHANG, arXiv:2011.02260v2 [cs

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

111

Graph Neural Networks in Recommender Systems: A Survey

SHIWEN WU, Peking UniversityFEI SUN, Alibaba GroupWENTAO ZHANG, Peking UniversityBIN CUI, Peking University

With the explosive growth of online information, recommender systems play a key role to alleviate suchinformation overload. Due to the important application value of recommender systems, there have always beenemerging works in this field. In recent years, graph neural network (GNN) techniques which can naturallyintegrate node information and topological structure have gained considerable interests. Owing to theiroutstanding performance in graph data learning, GNN techniques have been widely applied in many fields.In recommender systems, the main challenge is to learn the effective user/item representations from theirinteractions and side information (if any). Since most of the information essentially has graph structureand GNN has superiority in representation learning, the field of utilizing GNN in recommender systems isflourishing. This article aims to provide a comprehensive review of recent research efforts on GNN-basedrecommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models and statenew perspectives pertaining to the development of this field.

CCS Concepts: • Information Systems→ Recommender systems.

Additional Key Words and Phrases: Recommender System; Graph Neural Network; Survey

ACM Reference Format:Shiwen Wu, Fei Sun, Wentao Zhang, and Bin Cui. 2021. Graph Neural Networks in Recommender Systems: ASurvey. J. ACM 37, 4, Article 111 (April 2021), 34 pages. https://doi.org/10.1145/1122445.1122456

1 INTRODUCTIONWith the rapid development of e-commerce and social media platforms, recommender systemshave become indispensable tools for many businesses [13, 145, 153]. They can be recognized asvarious forms depending on industries, like product suggestions on online e-commerce websites(e.g., Amazon and Taobao) or playlist generators for video and music services (e.g., YouTube, Netflix,and Spotify). Users rely on recommender systems to alleviate the information overload problemand explore what they are interested in from the vast sea of items (e.g., products, movies, news,or restaurants). To achieve this goal, accurately modeling users’ preferences from their historicalinteractions (e.g., click, watch, read, and purchase) lives at the heart of an effective recommendersystem.Broadly speaking, in the past decades, the mainstream modeling paradigm in recommender

systems has evolved from neighborhood methods [3, 35, 64, 89] to representation learning basedframework [13, 50, 51, 90, 110]. Item-based neighborhood methods [3, 64, 89] directly recommenditems to users that are similar to the historical items they have interacted with. In a sense, they

Author’s addresses: S. Wu, W. Zhang, B. Cui, Peking University; emails: [email protected]; [email protected];[email protected]; F. Sun, Alibaba Inc.; [email protected];Wentao Zhang and Bin Cui are the corresponding authors.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.0004-5411/2021/4-ART111 $15.00https://doi.org/10.1145/1122445.1122456

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

arX

iv:2

011.

0226

0v2

[cs

.IR

] 1

9 A

pr 2

021

111:2 Wu, et al.

represent uses’ preferences by directly using their historical interacted items. Early item-basedneighborhood approaches have achieved great success in real-world applications because of theirsimplicity, efficiency, and effectiveness.

An alternative approach is representation learning based methods that try to encode both usersand items as continuous vectors (also known as embeddings) in a shared space, thus makingthem directly comparable. Representation based models have sparked a surge of interest sincethe Netflix Prize competition [4] demonstrates matrix factorization models are superior to classicneighborhood methods for recommendations. After that, various methods have been proposedto learn the representations of users and items for better estimating the users’ preferences onitems, from matrix factorization [50, 51] to deep learning models [13, 33, 90, 153]. Nowadays, deeplearning models have been a dominant methodology for recommender systems in both academicresearch and industrial applications due to the ability in effectively capturing the non-linear andnon-trivial user-item relationships and easily incorporating abundant data sources, e.g., contextual,textual, and visual information.Among all those deep learning algorithms, GNN is undoubtedly the most attractive technique

because of its superior ability in learning on graph-structured data, which is fundamental forrecommender systems [104, 145]. For example, the interaction data in a recommendation applicationcan be represented by a bipartite graph between user and item nodes, with observed interactionsrepresented by links. Even the item transitions in users’ behavior sequences can also be constructedas graphs. The benefit of formulating recommendation as a task on graphs becomes especiallyevident when incorporating structured external information, e.g., the social relationship amongusers [17, 132] and knowledge graph related to items [113, 150]. In this way, GNN provides a unifiedperspective to model the abundant heterogeneous data in recommender systems.Nevertheless, providing a unified framework to model the abundant data in recommendation

applications is only part of the reason for the widespread adoption of GNN in recommendersystems. Another reason is that, different from traditional methods that only implicitly capture thecollaborative signals (i.e., using user-item interactions as the supervised signals for model training),GNN can naturally and explicitly encode the crucial collaborative signal (i.e., topological structure)to improve the user and item representations. In fact, using collaborative signals to improverepresentation learning in recommender systems is not a new idea that originated from GNN [23,43, 49, 140, 156]. Early efforts, such as SVD++ [49] and FISM [43], have already demonstratedthe effectiveness of the interacted items in user representation learning. In view of user-iteminteraction graph, these previous works can be seen as using one-hop neighbors to improve userrepresentation learning. The advantage of GNN is that it provides powerful and systematic toolsto explore multi-hop relationships which have been proven to be beneficial to the recommendersystems [31, 119, 145].With these advantages, GNN has achieved remarkable success in recommender systems in

the past few years. In academic research, a lot of works demonstrate that GNN-based modelsoutperform previous methods and achieve new state-of-the-art results on the public benchmarkdatasets [31, 119, 162]. Meanwhile, plenty of their variants are proposed and applied to variousrecommendation tasks, e.g., session based recommendation [82, 136], Points-of-interest (POI)recommendation [6, 62], group recommendation [34, 117], and bundle recommendation [7]. Inindustry, GNN has also been deployed in web-scale recommender systems to produce high-qualityrecommendation results [16, 81, 145]. For example, Pinterest developed and deployed a random-walk-based Graph Convolutional Network (GCN) algorithm model named PinSage on a graph with3 billion nodes and 18 billion edges, and gained substantial improvements in user engagement inonline A/B test.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:3

Contribution of this survey. Given the impressive pace at which the GNN-based recommendationmodels are growing, we believe it is important to summarize and describe all the representativemethods in one unified and comprehensible framework. There exist comprehensive surveys onrecommender systems [84, 153] or graph neural networks [139, 161]. However, there is a lack of awork that thoroughly summarizes the literature on the advances of GNN-based recommendationand discusses open issues or future directions in this field. To the best of our knowledge, thissurvey is the first work to fill up this gap. The researchers and practitioners who are interested inrecommender systems could have a general understanding of the latest developments in the fieldof GNN-based recommendation. The key contributions of this survey are summarized as follows:

• New taxonomy. We propose a systematic classification schema to organize the existingGNN-based recommendation models. We first categorize the existing works into generalrecommendation and sequential recommendation based on the task they deal with. Then, wefurther categorize the existingmodels in these two tasks into three categories: only interactioninformation, social network enhanced, and knowledge graph enhanced, respectively. Onecan easily step into this field and make a distinction between different models.

• Comprehensive review. For each category, we demonstrate the main issues to deal with.Moreover, we briefly introduce the representative models and illustrate how they addressthese issues.

• Future research. We discuss the limitations of current methods and propose six potentialfuture directions.

The remaining of this article is organized as follows: Section 2 introduces the preliminariesfor recommender systems and graph neural networks, and presents the classification framework.Section 3 and Section 4 summarize the main issues for each category and how existing works tacklethese challenges for general recommendation and sequential recommendation respectively. Section5 gives an introduction about the other recommendation tasks applying GNN. Section 6 discussesthe challenges and points out future directions in this field. Finally, we conclude the survey inSection 7.

2 BACKGROUNDS AND CATEGORIZATIONBefore diving into the details of this survey, we give a brief introduction to recommender systemsand GNN techniques. We also discuss the motivation of utilizing GNN techniques in recommendersystems. Furthermore, we propose a new taxonomy to classify the existing GNN-based models. Foreasy reading, we summarize the notations that will be used throughout the paper in Table 1.

2.1 Recommender SystemsRecommender systems infer users’ preferences from user-item interactions or static features, andfurther recommend items that users might be interested in [1]. It has been a popular research areafor decades because it has great application value and the challenges in this field are still not welladdressed. The main research of recommender systems usually can be classified into two essentialtypes of tasks, i.e., modeling user’s static preferences from paired interactions or modeling user’sdynamic preferences from the sequential behaviors.

General recommendation1 usually assumes the users have static preferences and modelsthem based on either implicit (e.g., clicks, reads, or purchases) or explicit (i.e., ratings) feedbacks. Acommon paradigm for general recommendation model is to reconstruct users’ historical interactionsby the representations of users and items. Specifically, we formulate the problem as follows:1In fact, it is usually called as collaborative filtering task. Here, we name it as general recommendation to distinguish itfrom sequential recommendation that will be introduced later.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:4 Wu, et al.

Table 1. Key notations used in this paper

Notations Descriptions

U/I The set of users/itemsu Hidden vector of user 𝑢i Hidden vector of item 𝑖

R = {𝑟𝑢,𝑖 } Interaction between users and itemsGS Social relationship between usersGKG Knowledge graphEKG = {𝑒𝑖 } The set of entities in Knowledge graphRKG = {𝑟𝑒𝑖 ,𝑒 𝑗 } The set of relations in Knowledge graphA Adjacency matrix of graphAin/Aout In & out adjacency matrix of directed graphN𝑣 Neighborhood set of node 𝑣h(𝑙)𝑣 Hidden state of node embedding at layer 𝑙

n(𝑙)𝑣 Aggregated vector of node 𝑣 ’s neighbors at layer 𝑙

h∗𝑢 Final representation of user 𝑢h∗𝑖 Final representation of item 𝑖

h𝑆𝑢 Final representation of user 𝑢 in the social spaceh𝐼𝑢 Final representation of user 𝑢 in the item spaceW(𝑙) Transformation matrix at layer 𝑙W(𝑙)

𝑟 Transformation matrix of relation 𝑟 at layer 𝑙b(𝑙) Bias term at layer 𝑙⊕ Vector concatenation⊙ Element-wise multiplication operation

Given the user setU, item setI, and the observed interactions between users and itemsR : U×I,for any user 𝑢 ∈ U, general recommendation is to estimate her/his preference for any item 𝑖 ∈ Iby the learnt user representation ℎ∗𝑢 and item representation ℎ∗𝑖 , i.e.,

𝑦𝑢,𝑖 = 𝑓 (ℎ∗𝑢, ℎ∗𝑖 ) (1)

where score function 𝑓 (·) can be dot product, cosine, multi-layer perceptions, etc., and 𝑦𝑢,𝑖 denotesthe preference score for user 𝑢 on item 𝑖 , which is usually presented in probability.There exist abundant works focusing on the general recommendation task. Most early studies

consider user-item interactions in the form of the matrix, and formulate the recommendation asthe matrix completion task [52]. In this way, Matrix Factorization (MF) projects users and itemsinto a shared vector space to reconstruct the whole user-item interaction matrix, i.e., estimatinga users’ preferences on their unseen items [52, 54, 87]. Recently, deep learning has dramaticallyrevolutionized recommender systems. One line of such research seeks to improve the recommen-dation performance by integrating auxiliary information with the power of deep learning, e.g.,text [47, 110] and images [44, 115]. Another line of research tries to employ more powerful modelarchitecture to take the place of conventional matrix factorization, e.g., multi-layer perceptions

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:5

(MLP) [33] and auto-encoding [90, 138]. For comprehensive introduction about deep learning basedrecommender systems, we refer the readers to the survey [153].

Sequential recommendation, in addition to general recommendation, is another mainstreamresearch direction in recommender systems, which believes the user’s preference is dynamic andevolving. This type of research seeks to predict the successive item(s) that a user is likely to interactwith via exploring the sequential patterns in her/his historical interactions. According to whetherthe users are anonymous or not and whether the behaviors are segmented into sessions, works inthis field can be further divided into sequential recommendation and session-based recommendation.Session-based recommendation can be viewed as a sub-type of sequential recommendation withanonymous and session assumptions [84]. In this survey, we do not distinguish them and refer tothem collectively as the much broader term sequential recommendation for simplicity since ourmain focus is the contribution of GNN to recommendation and the differences between them arenegligible for the application of GNN.In sequential recommendation, users’ historical interactions are organized into sequences in a

chronological order. We represent the interaction sequence for user 𝑢 with 𝑠𝑢 = [𝑖𝑠,1, 𝑖𝑠,2, . . . , 𝑖𝑠,𝑛],where 𝑖𝑠,𝑡 denotes the item which user 𝑢 interacts at the time step 𝑡 and 𝑛 denotes the length ofinteraction sequence. The task for the sequential recommendation is to predict the most possibleitemwhich the user𝑢 will interact with at the next time step𝑛+1, given her/his historical interactionsequences or auxiliary information if available. It can be formulated as follows:

𝑖∗𝑠,𝑛+1 = argmax𝑖∈I

𝑃

(𝑖𝑠,𝑛+1 = 𝑖 | 𝑠𝑢

). (2)

The main challenge of sequential recommendation is to learn an efficient sequence representationthat reflects the user’s current preference. Early works adopt Markov Chain (MC) [29, 86] to capturethe item-to-item transition based on the assumption that the most recent clicked item reflects theuser’s dynamic preference. Owing to the advantage of Recurrent Neural Network (RNN) in sequencemodeling, some works employ RNN unit to capture sequence patterns [36, 102]. To further enhancethe session representation, attention mechanism is leveraged to integrate the whole sequence inaddition to the most recent item [57, 68]. Inspired by the outperformance of Transformer [128]in NLP tasks, SASRec [45] and BERT4Rec [96] leverage self-attention technique to model iteminteractions, which allows more flexibility to item-to-item transitions. With the emerging of GNN,utilizing GNN to capture complex transition patterns of items has become popular in sequentialrecommendation.

2.2 Graph Neural Network TechniquesGraph data is widely used to represent complex relationships of objects, e.g., social graph andknowledge graph. Since the success of deep learning, many studies aim to analyze graph data withneural network, which leads to rapid development in the field of GNN. Recently, systems basedon variants of GNN have demonstrated ground-breaking performances on many tasks related tograph data, such as physical systems [2, 88], protein structure [19], and knowledge graph [27].The main idea of GNN is to iteratively aggregate feature information from neighbors and

integrate the aggregated information with the current central node representation during thepropagation process [139, 161]. From the perspective of network architecture, GNN is to stackmultiple propagation layers, which consist of aggregation and update operations. Note that, in thispaper, “aggregation” refers to collecting information from the neighborhoods as the aggregatedrepresentation of neighbors; “update” means integrating the central node representation and theaggregated representation as the latest central node representation; “propagation" means thecombination of the “aggregation” and “update”. In the aggregation step, existing works either treat

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:6 Wu, et al.

each neighbor equally with the mean-pooling operation [28, 60], or differentiate the importanceof neighbors with the attention mechanism [105]. In the update step, the representation of thecentral node and the aggregated neighborhood will be integrated into the updated representationof the central node. In order to adapt to different scenarios, various strategies are proposed to betterintegrate the two representations, such as GRU mechanism [60], concatenation with nonlineartransformation [28] and sum operation[105].According to the architecture design, GNN models can be categorized into recurrent GNN

(RecGNN), convolutional GNN (ConvGNN), spatial-temporal GNN (STGNN), and graph autoencoder(GAE) [139]. RecGNN aims to learn high-level node representations with recurrent neural structures,that is the same set of parameters is applied recurrently over nodes. ConvGNN stacks multiplepropagation layers with different parameters in each layer. As ConvGNN is more flexible andconvenient to composite with other neural networks, ConvGNN gains super popularity in recentyears. Most of the existing works in GNN-based recommendation adopt ConvGNN to simulate thepropagation process. STGNN is designed for the spatial-temporal graphs, which can capture bothspatial and temporal dependencies of a graph simultaneously. GAEs are widely used to learn graphembeddings in an unsupervised learning framework. To learn more about GNN techniques, werefer the readers to the survey [139, 161]. Here, we briefly summarize four typical GNN frameworkswhich are widely adopted in the field of recommendation.

• GCN [48] approximates the first-order eigendecomposition of the graph Laplacian to itera-tively aggregate information from neighbors. Concretely, it updates the embedding by

H(𝑙+1) = 𝛿(D− 1

2 AD− 12H(𝑙)W(𝑙)

),

where 𝛿 (·) is the nonlinear activation function, like ReLU,W(𝑙) is the learnable transformationmatrix for layer 𝑙 , A = A + I is the adjacency matrix of the undirected graph with addedself-connections, and D𝑖𝑖 =

∑𝑗 A𝑖 𝑗 .

• GraphSage [28] samples a fixed size of neighborhood for each node, proposesmean/sum/max-pooling aggregator and adopts concatenation operation for update,

n(𝑙)𝑣 = AGGREGATE𝑙

({h𝑙𝑢,∀𝑢 ∈ N𝑣

}), h(𝑙+1)

𝑣 = 𝛿

(W(𝑙) ·

[h(𝑙)𝑣 ⊕ n(𝑙)

𝑣

] ),

where AGGREGATE𝑙 denotes the aggregation function at 𝑙𝑡ℎ layer, 𝛿 (·) is the nonlinearactivation function, and W(𝑙) is the learnable transformation matrix.

• GAT [105] assumes that the influence of neighbors on the central node is neither identical norpre-determined by the graph structure, thus it differentiates the contributions of neighborsby leveraging attention mechanism and updates the vector of each node by attending overits neighbors,

𝛼𝑣 𝑗 =

exp(LeakyReLU

(a𝑇

[W(𝑙)h(𝑙)

𝑣 ⊕ W(𝑙)h(𝑙)𝑗

] ))∑

𝑘∈N𝑣exp

(LeakyReLU

(a𝑇

[W(𝑙)h(𝑙)

𝑣 ⊕ W(𝑙)h(𝑙)𝑘

] )) ,h(𝑙+1)𝑣 = 𝛿

(∑𝑗 ∈N𝑣

𝛼𝑣 𝑗W(𝑙)h(𝑙)𝑗

),

where a is the learnable parameter andW(𝑙) is responsible for transforming the node repre-sentations at 𝑙𝑡ℎ propagation.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:7

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5

𝑢1 0 1 1 0 0

𝑢2 0 0 1 0 1

𝑢3 1 0 0 1 0

𝑢4 1 0 1 1 1

𝑢2

𝑢1

𝑢3

𝑢4

𝑖1

𝑖2

𝑖3

𝑖4

𝑖5

(a) User-item bipartite graph.

𝑖1

𝑖2

𝑖3

𝑖2

𝑖4

userbehaviorsequence

𝑖1 𝑖2 𝑖3

𝑖4

(b) Sequence graph.

𝑢1

𝑢2

𝑢3

𝑢4

(c) Social relationship between users.

John Lasseter Toy Story

Woody Buzz Lightyear

Rex Walt Disney

cartoondirector genre

actor actor

actor producer

(d) Knowledge graph

Fig. 1. Representative graph structures in recommender systems.

• GGNN [60] is a typical RecGNN method, which adopts a gated recurrent unit (GRU) in theupdate step,

n(𝑙)𝑣 =

1|N𝑣 |

∑𝑗 ∈N𝑣

h(𝑙)𝑗,

h(𝑙+1)𝑣 = GRU(h(𝑙)

𝑣 , n(𝑙)𝑣 ) .

The advantage is that GGNN ensures convergence but it can be problematic for large graphssince GGNN runs the recurrent function several times over all nodes [139].

2.3 Why Graph Neural Network for RecommendationIn the past few years, many works on GNN-based recommendation have been proposed. Beforediving into the details of the latest developments, it is beneficial to understand the motivations ofapplying GNN to recommender systems.The most intuitive reason is that GNN techniques have been demonstrated to be powerful in

representation learning for graph data in various domains [161], and most of the data in recom-mendation has essentially a graph structure as shown in Figure 1. For general recommendation,the interaction data can be represented by a bipartite graph (as shown in Figure 1a) between userand item nodes, where the link represents the interaction between the corresponding user anditem. For sequential recommendation, a sequence of items can be transformed into the sequencegraph, where each item can be connected with one or more subsequent items. Figure 1b showsan example of sequence graph where there is an edge between consecutive items. Compared tothe original sequence data, sequence graph allows more flexibility to item-to-item relationships.Beyond that, some side information also has naturally graph structure, such as social relationshipand knowledge graph, as shown in Figure 1c and 1d.

Traditionally, for general recommendation, researchers try to learn the user/item representationsfrom paired interactions. In terms of sequential recommendation, researchers employ sequence

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:8 Wu, et al.

models to discover the user’s dynamic preference from sequential behavior and predict whatshe/he is likely to interact with next. Due to the specific characteristic of different types of datain recommendation, a variety of models have been proposed to effectively learn their pattern forbetter recommendation results, which is a big challenge for the model design. Considering theinformation in recommendation from the perspective of the graph, a unified GNN framework canbe utilized to address all these tasks. The task of general recommendation is to learn the effectivenode representations, i.e., user/item representations, and to further predict user preferences. Thetask of sequential recommendation is to learn the informative graph representation, i.e., sequencerepresentation. Both node representation and graph representation can be learned through GNN.Besides, it is more convenient and flexible to incorporate additional information (if available),compared to non-graph perspective.Moreover, for general recommendation, GNN can explicitly encode the crucial collaborative

signal of user-item interactions to enhance the user/item representations through propagationprocess. Utilizing collaborative signals for better representation learning is not a completely newidea. For instance, SVD++ [49] incorporates the representations of interacted items to enrich theuser representations. ItemRank [23] constructs the item-item graph from interactions, and adoptsthe random-walk algorithm to rank items according to user preferences. Note that SVD++ canbe seen as using one-hop neighbors (i.e., items) to improve user representations, while ItemRankutilizes two-hop neighbors to improve item representations. Compared with non-graph model,GNN is more flexible and convenient to model multi-hop connectivity from user-item interactions,and the captured CF signals in high-hop neighbors have been demonstrated to be effective forrecommendation.For sequential recommendation, transforming the sequence data into a sequence graph allows

more flexibility to the original transition of item choices. Duplicate items in a user’s sequentialbehavior is a common phenomenon, which can lead to cyclic structure in the sequence graph.Figure 1b shows an example of a sequence of clicked items, and the corresponding sequence graph.In view of sequence, the last item 𝑖4 is influenced by the previous four items consecutively. In viewof sequence graph, there exist two paths to 𝑖4 due to the cyclic structure between 𝑖2 and 𝑖3. Thesequence models strictly obey the time order of sequence, whereas GNN can capture the complexuser preference implicit in sequential behavior due to the cyclic structure.

2.4 Categories of Graph Neural Network Based RecommendationIn this part, we propose a new taxonomy to classify the existing GNN-based models. We classifythe existing works based on the types of information used. We further divide the existing modelsinto the general and sequential recommendation in terms of whether to consider the order of items.Figure 2 summarizes the classification scheme.

The rationale of classification lies in two aspects: (1) different types of information have differentcharacteristics of graph structure, which require corresponding GNN strategies; (2) the assumptionbehind the general and sequential recommendation is different since the former one considersstatic users’ preferences while the latter one intends to capture users’ dynamic preferences.

Here, we give a brief introduction to the categories of general and sequential recommendation:

• General Recommendation. In these categories, models assume that users’ preferencesare invariant with time. We further categorize them into three subcategories based on thetypes of information used. Without side information, existing models consider the user-item relationships as a user-item bipartite graph. With social relationship information, theGNN techniques are leveraged to mine social influence to augment users’ representations.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:9

GNN-basedRecommendation

General Recommendation

Sequential Recommendation

social network enhanced

user-item interaction information

knowledge graph enhanced

social network enhanced

sequence information

knowledge graph enhanced

Fig. 2. Categories of graph neural network based recommendation models.

With knowledge graph information, the GNN methods are adopted to capture item-to-itemrelatedness to enhance items’ representations.

• Sequential Recommendation. For sequential recommendation, the core idea is to capturetransition patterns in sequences for next item(s) recommendation. Most of the existing workscapture users’ dynamic preferences only based on the sequences. They firstly construct asequence graph and then leverage GNN methods to capture transition patterns.

3 GENERAL RECOMMENDATIONGeneral recommendation aims to model user preferences by leveraging user-item interaction data,thus provide a list of recommendations reflecting the static long-term interests of each user. Inaddition to the information of user-item interactions, auxiliary information (if available) is oftenleveraged to enhance the user/item representation. Two typical types of auxiliary information aresocial network and knowledge graph. It is worth noting that different types of information havedifferent characteristics of graph structures. Specifically, the user-item bipartite graph has two typesof nodes and the neighbors of each node are homogeneous; social network is a homogeneous graph;knowledge graph has various entities and relations. Considering that different characteristicsof graph structure require corresponding GNN strategies, we categorize them by the type ofinformation used.In this section, we summarize the overall framework, introduce how existing works deal with

the main issues under each subcategory, and discuss both their advantages and possible limitations.For convenience, we briefly summarize the works in table 2.

3.1 User-item interaction informationOwing to the superiority of GNN in learning on graph data, there are emerging efforts in recom-mender systems utilizing GNN architecture to model the recommendation task from the perspectiveof the graph. The basic idea is essentially using the items interacted by users to enhance userrepresentation and using the users once interacted with items to enrich item representation. Inthis way, multi-layer GNN methods enable to simulate information diffusion process and exploithigh-order connectivity from user-item interactions more efficiently.Given the user-item bipartite graph, the key challenge is how to propagate the information of

interacted items/users to the user/item, and learn the final user/item representations for prediction.To take the full advantage of GNN methods on the bipartite graph, there are four main issues todeal with:

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:10 Wu, et al.

Table 2. Graph neural network based general recommendation.

Information Graph Model Venue Year GNN Framework

user-iteminteractions

Bi1 GC-MC [104] KDD 2018 variant of GCN (mean-pooling)Bi PinSage [145] KDD 2018 variant of GraphSage (importance-pooling)Bi SpectralCF [159] RecSys 2018 variant of GCNBi STAR-GCN [151] IJCAI 2019 variant of GCN (mean-pooling)Bi NGCF [119] SIGIR 2019 variant of GraphSage (node affinity)Bi Bi-HGNN [55] IJCAI 2019 GraphSageBi LR-GCCF [10] AAAI 2020 variant of GCN (w/o activation)Bi LightGCN [31] arxiv 2020 variant of GCN (w/o activation & transformation)Bi IG-MC [152] ICLR 2020 variant of GraphSage (sum updater)Bi DGCN-BinCF [109] IJCAI 2019 variant of GCN (CrossNet)Bi HashGNN [101] WWW 2020 variant of GCN (mean-pooling)Bi NIA-GCN [97] SIGIR 2020 variant of GraphSage (neighbor interaction)Bi AGCN [134] SIGIR 2020 variant of GCN (w/o activation)Bi MBGCN [42] SIGIR 2020 hiearchical aggregationBi MCCF [123] AAAI 2020 GATBi DGCF [121] SIGIR 2020 hiearchical aggregationBi GraphSAIL [144] CIKM 2020 GraphSageBi DisenHAN [125] CIKM 2020 hiearchical aggregationBi Gemini [143] KDD 2020 variant of GATBi Multi-GCCF [98] ICDM 2019 GraphSageBi DGCF [72] arxiv 2020 variant of GCN (w/o activation & transformation)Bi TransGRec [133] SIGIR 2020 variant of GraphSage (sum updater)

user-iteminteractions

&socialnetwork (SN)

SN DiffNet [132] SIGIR 2019 GraphSageBi + SN GraphRec [17] WWW 2019 variant of GATSN + I2I DANSER [135] WWW 2019 variant of GATBi + SN DiffNet++ [131] TKDE 2020 variant of GATMotif ESRF [147] TKDE 2020 mean-pooling

user-iteminteractions

&knowledgegraph (KG)

KG KGCN [114] WWW 2019 variant of GATKG KGNN-LS [112] KDD 2019 variant of GCN (user-specific adjacent matrix)Bi + KG KGAT [118] KDD 2019 variant of GATU2U + I2I IntentGC [158] KDD 2019 variant of GraphSage (sum updater)KG AKGE [91] arxiv 2019 variant of GATBi + KG MKGAT [99] CIKM 2020 variant of GATBi + KG ACKRec [22] SIGIR 2020 meta-path basedKG ATBRG [18] SIGIR 2020 variant of GATBi + KG TGCN [8] CIKM 2020 variant of GAT (hierarchical)

1 Bi represents bipartite graph.

• Graph Construction. Graph structure is essential for the scope and type of information topropagate. The original bipartite graph consists of a set of user/item nodes and the interactionsbetween them. Whether to apply GNN over the heterogeneous bipartite graph or reconstructthe homogeneous graph based on two-hop neighbors? Considering computational efficiency,how to sample representative neighbors for graph propagation instead of operating on thefull graph?

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:11

Algorithm 1 Framework of GNN for User-item Bipartite Graph.Input: The set of usersU; the set of items I; the interactions between users and items R : U ×I1: Construct the full graph G or the subgraph G𝑢,𝑖 by sampling;2: for 𝑙 = 1 to 𝐿 do (Take the user node as an example)3: n(𝑙)

𝑢 = Aggregator(h(𝑙)𝑢 ,h(𝑙)

𝑖|𝑖 ∈ N𝑢)

4: h(𝑙+1)𝑢 = Updater(h(𝑙)

𝑢 ,n(𝑙)𝑢 )

5: end for6: Output: The final user/item representations h∗

𝑢 = 𝑓 ∗ (h(0)𝑢 , · · · ,h(𝐿)

𝑢 ), h∗𝑖 = 𝑓

∗ (h(0)𝑖, · · · ,h(𝐿)

𝑖)

• Neighbor Aggregation. How to aggregate the information from neighbor nodes? Specifi-cally, whether to differentiate the importance of neighbors, model the affinity between thecentral node and neighbors, or the interactions among neighbors?

• Information Update. How to integrate the central node representation and the aggregatedrepresentation of its neighbors?

• Final Node Representation. Predicting the user’s preference for the items requires theoverall user/item representation. Whether to use the node representation in the last layer orthe combination of the node representations in all layers as the final node representation?

3.1.1 Graph construction. Most of works [10, 31, 55, 97, 101, 104, 109, 119, 134, 151, 159] applythe GNN on the original user-item bipartite graph structure directly. There are two issues in theoriginal graph: one is effectiveness that the original graph structure might not be sufficient enoughfor learning user/item representations; another one is efficiency that it is impractical to propagateinformation on the graph with millions even billions of nodes.To address the first issue, existing works enrich the original graph structure by adding edges

or virtual nodes. In addition to the user-item graph, Multi-GCCF [98] and DGCF [72] add edgesbetween two-hop neighbors on the original graph to obtain the user-user and item-item graph.In this way, the proximity information among users and items can be explicitly incorporated intouser-item interactions. Considering that previous works ignore the users’ intents for adoptingitems, DGCF [121] introduces virtual intent nodes, which decompose the original graph into acorresponding subgraph for each intent. With the constraints of independence of different intents,the disentangled representations under different intents can be learned by iteratively propagatinginformation across the intent-aware subgraphs. The final representation is the integration of thesedisentangled representations, which represents the node from different aspects and has betterexpressive power.

In terms of the second issue, sampling strategies are proposed to make GNN efficient and scalableto large-scale graph based recommendation tasks. PinSage [145] designs a random-walk basedsampling method to obtain the fixed size of neighborhoods with highest visit counts. In this way,those nodes that are not directly adjacent to the central node may also become its neighbors. For thesake of inductive recommendation, IG-MC [152] uses target user/item, and their one-hop neighborsas nodes to construct the subgraph. The enclosing subgraph design reduces the dependence on theoriginal graph structure, which enhances its generalization for transferring the model to anotherdataset. Sampling is a trade-off between the original graph information and computational efficiency.PinSage includes more randomness while IG-MC sacrifices more graph information. In terms oftransferring, the sampling method of IG-MC is preferable; otherwise, the strategy of PinSage mightbe better. The performance of the model depends on the sampling strategy and the more efficientsampling strategy for neighborhood construction deserves further studying.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:12 Wu, et al.

3.1.2 Neighbor aggregation. The aggregation step is of vital importance of information propagationfor the graph structure, which decides how much neighbors’ information should be propagated.Mean-pooling is one of the most straightforward aggregation operation [98, 101, 104, 151, 152],which treats neighbors equally,

n(𝑙)𝑢 =

1|N𝑢 |

W(𝑙)h(𝑙)𝑖. (3)

Mean-pooling is easy for implementation but might be inappropriate when the importance ofneighbors is significantly different. Following the traditional GCN, some works employ “degreenormalization” [10, 31, 134], which assigns weights to nodes based on the graph structure,

n(𝑙)𝑢 =

∑𝑖∈N𝑢

1√|N𝑢 | |N𝑖 |

W(𝑙)h(𝑙)𝑖. (4)

Owing to the random-walk sampling strategy, PinSage [145] adopts the normalized visit counts asthe importance of neighbors when aggregating the vector representations of neighbors. However,these aggregation functions determine the importance of neighbors according to the graph structurebut ignore the relationships between the connected nodes.

Considering that the interacted items are not equally representative to reflect user preferences,researchers employ attention mechanism to differentiate the importance of neighbors [123]. Moti-vated by common sense that the embeddings of items in line with the user’s interests should bepassed more to the user (analogously for the items), NGCF [119] employs element-wise product toaugment the items’ features the user cares about or the users’ preferences for features the item has.Take the user node as an example, the neighborhood representation is calculated as follows:

n(𝑙)𝑢 =

∑𝑖∈N𝑢

1√|N𝑢 | |N𝑖 |

(W(𝑙)

1 h(𝑙)𝑖

+W(𝑙)2

(h(𝑙)𝑖

⊙ h(𝑙)𝑢

)). (5)

NIA-GCN [97] argues that existing aggregation functions fail to preserve the relational informationwithin the neighborhood, thus proposes the pairwise neighborhood aggregation approach to explic-itly capture the interactions among neighbors. Concretely, it applies element-wise multiplicationbetween every two neighbors to model the user-user/item-item relationships.Despite the efficacy of these methods, they pay insufficient attention to the case that there

are multiple types of interactions, such as browsing and clicking. To cope with the multi-typerelations between users and items, researchers design a hierarchical aggregation strategy andobserve effectiveness gains. For example, MBGCN [42] firstly aggregates the interacted itemsbelonging to each behavior respectively and further integrates the different aggregated behaviors.

3.1.3 Information update. With the aggregated neighbor representation and the central noderepresentation, how to update the representation of the central node is essential for iterative infor-mation propagation. Some works [31, 104, 121, 151] use the aggregated representation of neighborsas the new central node representation, i.e, discard the original information of the user or itemnode completely, which might overlook the intrinsic user preference or the intrinsic item property.The others take both the central node itself and its neighborhood message into considerationto update node representations. Some studies combines these two representations linearly withsum-pooling or mean-pooling operation [97, 119, 134, 152]. Inspired by the GraphSage [28], someworks [55, 98, 145] adopt concatenation function with nonlinear transformation to integrate thesetwo representations as follows:

h(𝑙+1)𝑢 = 𝜎

(W(𝑙) · (h(𝑙)

𝑢 ⊕ n(𝑙)𝑢 ) + b(𝑙)

), (6)

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:13

where 𝜎 denotes the activation function, e.g., ReLU, LeakyReLU, and sigmoid. Compared to linearcombination, concatenation operation with feature transformation allows more complex featureinteraction. Note that some works [10, 31] observe that nonlinear activation contributes little tothe overall performance, and they simplify the update operation by removing the non-linearities,thereby retaining or even improving performance and increasing computational efficiency. Forexample, to further simplify the GNN structure, LightGCN [31] removes the feature transformationand the experimental results illustrate its efficacy.

3.1.4 Final node representation. Applying the aggregation and update operations layer by layergenerates the representations of nodes for each depth of GNN. The overall representations ofusers and items are required for the final prediction task. Some works [55, 101, 104, 125, 145, 151]use the node vector in the last layer of GNN as the final representation, i.e., h∗

𝑢 = h(𝐿)𝑢 . However,

the representations obtained in different layers emphasize the messages passed over differentconnections [119]. Specifically, the representations in the lower layer reflect the individual featuremore while those in the higher layer reflect the neighbor feature more. To take advantage of theconnections expressed by the output of different layers, recent studies employ different methodsfor representation integration.

• Concatenation is one of the most widely adopted strategies [10, 42, 119, 152, 159], i.e., h∗𝑢 =

h(0)𝑢 ⊕ h(1)

𝑢 ⊕ · · · ⊕ h(𝐿)𝑢 .

• Some works employ mean-pooling operation [72], i.e., h∗𝑢 = 1

𝐿+1∑𝐿

𝑙=0 h(𝑙)𝑢 .

• Some integrate the representations of all layers with sum-pooling [121], i.e., h∗𝑢 =

∑𝐿𝑙=0 h

(𝑙)𝑢 .

• Either mean-pooling or sum-pooling integration treats the representations from differentlayers equally. Some integrate the representations of all layers with the layer-wise weighted-pooling [31, 31, 130], i.e., h∗

𝑢 = 1𝐿+1

∑𝐿𝑙=0 𝛼

(𝑙)h(𝑙)𝑢 .

Note that both mean-pooling and sum-pooling can be seen as two special cases of layer-wiseweighted pooling. The information propagation is realized by stacking multiple GNN layers, whichhelps to capture the long-range dependencies, and enhance the node representations with sufficientneighborhood information. However, stacking toomany layers will also lead to oversmoothing issue.As a result. some recent works [66, 95] have been proposed to adaptively balance this trade-off.

3.2 Social network enhancedWith the emergence of online social networks, social recommender systems have been proposed toutilize each user’s local neighbors’ preferences to enhance user modeling [25, 40, 74, 75, 132]. Allthese works assume users with social relationship should have similar representations based on thesocial influence theory that connected people would influence each other. Some of them directlyuse such relationship as regularizer to constraint the final user representations [40, 75, 76, 103],while others leverage such relationship as input to enhance the original user embeddings [25, 74].

From the perspective of graph learning, the early works mentioned above simply model thesocial influence by considering the first-order neighbors of each user, i.e., overlooking the recursivediffusion of influence in the social network. However, in practice, a user might be influenced byher friends’ friends. Overlooking the high-order influence diffusion in previous works might leadto the suboptimal recommendation performance [132]. Thanks to the ability of simulating howusers are influenced by the recursive social diffusion process, GNN has become a popular choice tomodel the social information in recommendation.To incorporate relationships among users into interaction behaviors by leveraging GNN, there

are two main issues to deal with:

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:14 Wu, et al.

𝑢2𝑢1 𝑢3 𝑢4

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5

𝑢1

𝑢2

𝑢3

𝑢4

Social GNN blockU-I GNN block

· · ·

· · · ℎ∗𝑖

· · · ℎ𝑆𝑢

· · ·ℎ∗𝑢

𝑦𝑢𝑖

ℎ𝐼𝑢

(a) The framework of GNN on the bipartite graphand social network graph separately.

𝑢2𝑢1 𝑢3 𝑢4

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5

𝑢1

𝑢2

𝑢3

𝑢4

𝑢1

𝑢2

𝑢4

𝑢3

𝑖2

𝑖1

𝑖3

𝑖4

𝑖5

GNN block

· · · ℎ∗𝑢· · ·ℎ∗𝑖

𝑦𝑢𝑖

(b) The framework of GNN on the unified graphof user-item interactions and social network.

Fig. 3. Two strategies for social enhanced general recommendation.

• Influence of Friends. Do friends have equal influence? If not, how to distinguish theinfluence of different friends?

• Preference Integration. Users are involved in two types of relationships, i.e., social relation-ship with their friends and interactions with items. How to integrate the user representationsfrom the social influence perspective and interaction behavior?

3.2.1 Influence of friends. Generally, social graph only contains information about whether theusers are friends or not, but the strength of social ties is usually unknown. To propagate theinformation of friends, it is essential to decide the influence of friends. DiffNet [132] treats theinfluence of friends equally by leveraging mean-pooling operation. However, the assumption ofequal influence is not in accordance with the actual situation, and the influence of a user is unsuitableto be simply determined by the number of her friends. Indeed, users are more likely to be influencedby the friends with strong social ties or similar preferences. Therefore, most works [17, 131, 135]differentiate the influence of neighbors by measuring the relationship between the linked friendswith attention mechanism. Compared to the mean-pooling operation, the attention mechanismboosts the overall performance, which further valids the assumption that different friends havedifferent influence power.

Moreover, a recent work, named ESRF [147], argues that social relations are not always reliable.The unreliability of social information lies in two aspects: on the one hand, the users with explicitsocial connections might have no influence power; on the other hand, the obtained social rela-tionships might be incomplete. Considering that indiscriminately incorporating unreliable socialrelationships into recommendation may lead to poor performance, ESRP leverages the autoencodermechanism to modify the observed social relationships by filtering irrelevant relationships andinvestigating the new neighbors.

3.2.2 Preference Integration. Users in social enhanced recommender systems are involved in twotypes of networks, one is the user-item bipartite graph and the other is the social graph. In orderto enhance the user preference representation by leveraging social information, there are twostrategies for combining the information from these two networks,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:15

• one is to learn the user representation from these two networks respectively [17, 132, 135]and then integrate them into the final preference vector, as illustrated in Figure 3a;

• another is to combine the two networks into one unified network [131] and apply GNN topropagate information, as illustrated in Figure 3b.

The advantage of the first strategy lies in two aspects: on the one hand, we can differentiate thedepth of diffusion process of two networks since they are treated separately; on the other hand,any advanced method for user-item bipartite graph can be directly applied, and for social network,a homogeneous graph, GNN techniques are extremely suitable for simulating the influence processsince they are originally proposed for homogeneous graphs. Here are two representative methods.Following the SVD++ [49] framework, DiffNet [132] combines the user representations from twospaces with sum-pooling operation, where the user preference in the item space is obtained byapplying mean-pooling over the historical items, and the user representation in the social networkis learned by leveraging GNN. Note that taking the average representation of interacted itemembeddings as the user preference in the item space is equivalent to aggregate one-hop neighborsfor the user vector in the view of GNN. DiffNet [132] simply combines the user representations fromtwo graphs with linear addition. To fully integrate these two latent factors, GraphRec [17] appliesmulti-layer MLPs over the concatenated vector to enhance the feature interactions by leveragingnon-linearity operations.The advantage of integrating the two graphs into one unified network is that both the higher-

order social influence diffusion in the social network and interest diffusion in the user-item bipartitegraph can be simulated in a unified model, and these two kinds of information simultaneouslyreflect users’ preferences. At each layer, DiffNet++ [131] designs a multi-level attention network toupdate user nodes. Specifically, it firstly aggregates the information of neighbors in the bipartitegraph (i.e., interacted items) and social network (i.e., friends) by utilizing the GAT mechanismrespectively. Considering that different users may have different preferences in balancing thesetwo relationships, it further leverages another attention network to fuse the two hidden statesof neighbors. Up till now, there is no evidence to show which strategy always achieves betterperformance.

3.3 Knowledge graph enhancedSocial network that reflects relationships between users, is utilized to enhance user representation,while knowledge graph that expresses relationships between items through attributes, is leveragedto enhance the item representation. Incorporating knowledge graph into recommendation can bringtwo-facet benefits [111]: (1) the rich semantic relatedness among items in a knowledge graph canhelp explore their connections and improve the item representation; (2) knowledge graph connectsa user’s historically interacted items and recommended items, which enhances the interpretabilityof the results.

Despite the above benefits, utilizing knowledge graph in recommendation is rather challengingdue to its complex graph structure, i.e., multi-type entities and multi-type relations. Previousworks preprocess knowledge graph by knowledge graph embedding (KGE) methods to learn theembeddings of entities and relations, such as [14, 113, 150, 155]. The limitation of commonly-usedKGE methods is that they focus on modeling rigorous semantic relatedness with the transitionconstraint, which are more suitable for the tasks related to graph, such as link prediction rather thanrecommendation [112]. Some studies design meta-paths to aggregate the neighbor information [92,100, 122, 148]. They rely heavily on the manually defined meta-paths, which require domainknowledge and are rather labor-intensive for complicated knowledge graph [114, 118].

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:16 Wu, et al.

Given the user-item interaction information as well as the knowledge graph, the knowledge graphenhanced recommendation seeks to take full advantage of the rich information in the knowledgegraph, which can help to estimate the users’ preferences for items by explicitly capturing relatednessbetween items. For the effectiveness of knowledge graph enhanced recommendation, there are twomain issues to deal with:

• Graph Construction. Considering the complexity of the knowledge graph and the necessityof fusing the information of the knowledge graph and user-item graph, some studies constructthe graph first. How to balance simplicity and informativeness? How to effectively incorporatethe user into knowledge graph?

• Relation-aware Aggregation. One characteristic of knowledge graph is that it has multipletypes of relations between entities. How to design a relation-aware aggregation function toaggregate information from linked entities?

3.3.1 GraphConstruction. Someworks directly apply GNNover the original knowledge graph [114],while some simplify the graph structure or construct the subgraph at the first stage based on theinformation of knowledge graph and user-item bipartite graph. The motivation comes from twoaspects: one is that the knowledge graph contains multiple entities and relationships, and is alwaysof large scale, which increases the challenge of applying GNN on the graph representation learning;another is that it is essential to incorporate user role into the knowledge graph in order to fullyutilize knowledge graph for recommendation, and simply taking user nodes as another type ofentities in knowledge graph may introduce unrelated information and is not effective enough.

One representative method for graph simplification is IntentGC [158], which simplifies the graphstructure by translating the first-order proximity in the multi-entity knowledge graph into second-order proximity, i.e., keeping the item-to-item relationship that only one node apart. Specifically, ifitems 𝑖1 and 𝑖2 are both connected by an auxiliary node, the type of the auxiliary node is denoted asthe relationship between these two items. The advantage of this transformation is that it turns themulti-entity graph into a homogeneous graph from the perspective of node types. But this strategyis not fit for the graph where most of the item nodes are multi-hop (more than two-hop) neighbors,since it greatly simplifies the graph structure at the cost of loss of linked information between twoitem nodes.

In order to focus on the entities and relations relevant to the user-item interactions, some studiesfirst automatically extract high-order subgraphs that link the target user and the target item withthe user’s historical interacted items and the related semantics in knowledge graph [18, 91]. Basedon the assumption that a shorter path between two nodes reflects more reliable connections,AKGE [91] constructs the subgraph by the following steps: pretrain the embeddings of entities inthe knowledge graph by TransR [63]; calculate the pairwise Euclidean distance between two linkedentities; keep the 𝐾 paths with the shortest distance between the target user and item node. Thepotential limitation is that the subgraph structure depends on the pretrained entity embeddingsand the definition of distance measurement. ATBRG [18] exhaustively searches the multi-layerentity neighbors for the target item and the items from the user’s historical behaviors, and restoresthe paths connecting the user behaviors and the target item by multiple overlapped entities. Inorder to emphasize the information-intensive entities, ATBRG further prunes the entities with asingle link, which can also help control the scale of the graph. Although these methods can obtainthe subgraphs that are more relevant to the target user-item pair, it is quite time-consuming toeither pretrain the entity embedding or exhaustively search and prune paths. An effective andefficient subgraph construction strategy is worth of further investigation.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:17

3.3.2 Relation-aware Aggregation. To fully capture the semantic information in knowledge graph,both the linked entities (i.e., 𝑒𝑖 , 𝑒 𝑗 ) and the relations in between (i.e., 𝑟𝑒𝑖 ,𝑒 𝑗 ) should be taken intoconsideration during the propagation process. Besides, from the perspective of recommendersystems, the role of users might also have influence. Owing to the advantage of GAT in adaptivelyassigning weights based on the connected nodes, most of the existing works apply the variants ofthe traditional GAT over the knowledge graph, i.e., the central node is updated by the weightedaverage of the linked entities, and the weights are assigned according to the score function, denotedas 𝑎(𝑒𝑖 , 𝑒 𝑗 , 𝑟𝑒𝑖 ,𝑒 𝑗 , 𝑢). The key challenge is to design a reasonable and effective score function.

For the works [18, 99, 118] that regard the user nodes as one type of entities, the users’ preferencesare expected to be spilled over to the entities in the knowledge graph during the propagation processsince the item nodeswould be updatedwith the information of interacted users and related attributes,then the other entities would contain users’ preferences with iterative diffusion. Therefore, theseworks do not explicitly model users’ interests in relations but differentiate the influence of entitiesby the connected nodes and their relations. For instance, inspired by the transition relationship inknowledge graph, KGAT [118] assigns the weight according to the distance between the linkedentities in the relation space,

𝑎(𝑒𝑖 , 𝑒 𝑗 , 𝑟𝑒𝑖 ,𝑒 𝑗 , 𝑢) =(W𝑟e𝑗

)⊤ tanh( (W𝑟e𝑖 + e𝑟𝑒𝑖 ,𝑒 𝑗

) ), (7)

where W𝑟 is the transformation matrix for the relation, which maps the entity into relation space.In this way, the closer entities would pass more information to the central node. These methodsare more appropriate for the constructed subgraph containing user nodes, since it is difficult forthe users’ interests to extend to all the related entities by stacking a limited number of GNN layers.

For the works that do not combine the two sources of graphs, these studies [112, 114] explicitlycharacterize users’ interests in relations by assigning weights according to the connecting relationand specific user. For example, the score function adopted by KGCN [114] is the dot product of theuser embedding and the relation embedding, i.e.,

𝑎(𝑒𝑖 , 𝑒 𝑗 , 𝑟𝑒𝑖 ,𝑒 𝑗 , 𝑢) = u⊤rei,ej . (8)

In this way, the entities whose relations are more consistent with users’ interests will spread moreinformation to the central node.

4 SEQUENTIAL RECOMMENDATIONSequential recommendation predicts users’ next preferences based on their most recent activities,which seeks to model sequential patterns among successive items, and generate accurate recom-mendations for users [84]. Inspired by the advantage of GNN, it is becoming popular to utilize GNNto capture the transition pattern from users’ sequential behaviors by transforming them into thesequence graph. Most of the existing works focus on inferring temporal preference only from thesequence. Side information can be utilized to enhance the sequential information as well, althoughthere is little work in this field. For convenience, we briefly summarize the works to be discussedin table 3.

4.1 Sequence informationFrom the perspective of adjacency between items, sequences of items can be modeled as graph-structured data. Based on the sequence graph, GNN can capture transitions of items through iterativepropagation and learn the representative item embeddings. Figure 4 illustrates the overall frameworkof GNN in sequential recommendation. To fully utilize GNN in the sequential recommendation,there are three main issues to deal with:

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:18 Wu, et al.

Table 3. Graph neural network based sequential recommendation. The note of “in & out adjacent matrices”means the sequence graphs are regarded as directed graphs.

Information Graph Model Venue Year GNN Framework

Sequence(Seq)

Seq SR-GNN [136] AAAI 2019 variant of GGNN (in & out adjacent matrices)Seq GC-SAN [141] IJCAI 2019 variant of GGNN (in & out adjacent matrices)Seq NISER [26] CIKM 2019 variant of GGNN (in & out adjacent matrices)Seq TAGNN [146] SIGIR 2020 variant of GGNN (in & out adjacent matrices)Seq FGNN [82] CIKM 2019 variant of GATSeq A-PGNN [137] TKDE 2019 variant of GGNN (in & out adjacent matrices)Seq MA-GNN [73] AAAI 2020 GraphSageSeq MGNN-Spred [116] WWW 2020 variant of GraphSage (sum updater)Seq HetGNN [116] WWW 2020 variant of GraphSage (sum updater)Seq GAG [83] SIGIR 2020 variant of GGNN (in & out adjacent matrices)Seq SGNN-HN [80] CIKM 2020 variant of GGNN (in & out adjacent matrices)Seq LESSR [11] KDD 2020 -Seq GCE-GNN [126] SIGIR 2020 variant of GATSeq ISSR [65] AAAI 2020 GraphSageSeq DGTN [160] arxiv 2020 mean-pooling

Seq & SN SN DGRec [94] WSDM 2019 variant of GAT

Seq & KG Seq Wang and Cai 107 [107] MDPI 2020 variant of GGNN (in & out adjacent matrices)

𝑖1

𝑖2

𝑖3

𝑖2

𝑖4

userbehaviorsequence

𝑖2

𝑖1

𝑖3

𝑖4

GNNblock

· · · ℎ𝑖𝑠,3

· · · ℎ𝑖𝑠,2

· · · ℎ𝑖𝑠,1

· · · ℎ𝑖𝑠,4

· · · ℎ𝑖𝑠,5

Sequencemodel

···

ℎ∗𝑠 Nextprediction

task

Fig. 4. The overall framework of GNN in sequential recommendation.

• Graph Construction. To apply GNN in the sequential recommendation, the sequence datashould be transformed into a sequence graph. Is it sufficient to construct a subgraph for eachsequence independently? Would it be better to add edges among several consecutive itemsthan only between the two consecutive items?

• Information Propagation. To capture the transition patterns, which kind of propagationmechanism is more appropriate? Is it necessary to distinguish the sequential order of thelinked items?

• Sequential Preference. To get the user’s temporal preference, the representations in asequence should be integrated. Whether to simply apply attentive pooling or leverage RNNstructure to enhance consecutive time patterns?

4.1.1 Graph construction. Unlike the user-item interactions which have essentially bipartite graphstructure, the sequential behaviors in traditional sequential recommendation are naturally expressedin the order of time, i.e. sequences, instead of sequence graphs. Reconstructing graph based onthe original bipartite graph is optional and mainly driven by the scalability or heterogeneity issue,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:19

whereas the construction of sequence graph based on users’ sequential behaviors is a necessity forapplying GNN in sequential recommendation.

Most works [26, 82, 83, 136, 141] construct the directed graph for each sequence by treating eachitem in the sequence as a node and adding edges between two consecutively clicked items. In mostscenarios, the length of the user sequence is short, e.g., the average length on the preprocessedYoochoose1/42 dataset is 5.71 [136]. A sequence graph constructed from a single and short sequenceconsists of a small number of nodes and connections, and some nodes might even have only oneneighbor, which contain too limited knowledge to reflect users’ dynamic preferences and cannottake full advantage of GNN in graph learning. To tackle this challenge, recent works propose severalstrategies to enrich the original sequence graph structure.The most straightforward way is to utilize additional sequences. HetGNN [116] takes other

behavior sequences to enrich the target behavior. A-PGNN [137] deals with the occasion whenusers are known, thus incorporate the user’s historical sequences with the current sequence toenrich the item-item connections. Based on the assumption that similar sequences might reflectsimilar transition patterns, DGTN [160] integrates the current sequence and its neighbor (similar)sessions into a single graph. GCE-GNN [126] exploits the item transitions in all sessions to assistthe transition patterns in the current sequence, which leverages the local context and global context.All these methods introduce more information into the original sequence graph, and improve theperformance compared to a single sequence graph.Another mainstream approach is to adjust the graph structure of the current sequence. For

example, assuming the current node has direct influence on more than one consecutive item,MA-GNN [73] extracts three subsequent items and adds edges between them. Considering that onlyadding edges between consecutive items might neglect the relationships between distant items,SGNN-HN [80] introduces a virtual “star” node as the center of the sequence, which is linked withall the items in the current sequence. The vector-wise representation of the “star” node reflectsthe overall characteristics of the whole sequence. Hence, each item can gain some knowledgeof the items without direct connections through the “star” node. Chen and Wong [11] point outthat existing graph construction methods ignore the sequential information of neighbors, andbring about the ineffective long-term capturing problem. Therefore, they propose LESSR, whichconstructs two graphs from one sequence: one distinguishes the order of neighbors, another allowsthe short-cut path from the item to all the items after it.

4.1.2 Information propagation. Given the sequence graph, it is essential to design an efficientpropagation mechanism to capture transition patterns among items. Some studies [26, 80, 136, 141]adjust the GGNN framework for propagation on the directed graph. Specifically, it employs mean-pooling to aggregate the information of the previous items and the next items respectively, combinesthe two aggregated representations, and utilizes GRU component to integrate the information ofneighbors and the central node. The propagation functions are given as follows:

nin(𝑙)𝑖𝑠,𝑡

=1

|N in𝑖𝑠,𝑡

|Σ 𝑗 ∈Nin

𝑖𝑠,𝑡

h(𝑙)𝑗, nout(𝑙)

𝑖𝑠,𝑡=

1|Nout

𝑖𝑠,𝑡|Σ 𝑗 ∈Nout

𝑖𝑠,𝑡h(𝑙)𝑗,

n(𝑙)𝑖𝑠,𝑡

= nin(𝑙)𝑖𝑠,𝑡

⊕ nout(𝑙)𝑖𝑠,𝑡

, h(𝑙+1)𝑖𝑠,𝑡

= GRU(h(𝑙)𝑖𝑠,𝑡, n(𝑙)

𝑖𝑠,𝑡),

(9)

whereN in𝑖𝑠,𝑡

,Nout𝑖𝑠,𝑡

denotes the neighborhood set of previous items and next items,GRU(·) representsthe GRU component. Different from the pooling operation, the gate mechanism in GRU decides whatinformation to be preserved and discarded. Unlike GGNN which treats the neighbors equally, some

2The dataset is available at http://2015.recsyschallenge.com/challege. html. Note that this work preprocesses the dataset byfiltering out the sequences of length 1 and items appearing less than 5 times.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:20 Wu, et al.

works [82, 126] utilize attention mechanism to differentiate the importance of neighbors. All theabove methods adopt the permutation-invariant aggregation function during the message passing,but ignore the order of items within neighborhood, which may lead to the loss of information [11].To address this issue, LESSR [11] preserves the order of items in the graph construction, andleverages the GRU component to aggregate the neighbors sequentially, as the following equation:

n(𝑙)𝑖𝑠,𝑡 ,𝑘

= GRU(𝑙) (n(𝑙)𝑖𝑠,𝑡 ,𝑘−1, h

(𝑙)𝑖𝑠,𝑡 ,𝑘

), (10)

where h(𝑙)𝑖𝑠,𝑡 ,𝑘

represents the𝑘𝑡ℎ item in the neighborhood of 𝑖𝑠,𝑡 ordered by time, and n(𝑙)𝑖𝑠,𝑡 ,𝑘

denotes theneighborhood representation after aggregating 𝑘 items. For user-aware sequential recommendation,A-PGNN [137] and GAGA [83] augment the representations of items in the neighborhood withuser representation.

4.1.3 Sequential preference. Due to the limited iteration of propagation, GNN cannot effectivelycapture long-range dependency among items. Therefore, the representation of the last item (or anyitem) in the sequence is not sufficient enough to reflect the user’s sequential preference. Besides,most of the graph construction methods of transforming sequences into graphs lose part of thesequential information [11]. In order to obtain the effective sequence representation, existing workspropose several strategies to integrate the item representations in the sequence.

Considering that the items in a sequence have different levels of priority, the attention mechanismis widely adopted for integration. Some works [80, 83, 136, 160] calculate the attentive weightsbetween the last item and all the items in the sequence and aggregate the item representations asthe global preference, and incorporate it with local preference (i.e., the last item representation)as the overall preference. In this way, the overall preference relies heavily on the relevance of thelast item to the user preference. Inspired by the superiority of multi-layer self-attention strategyin sequence modeling, GC-SAN [141] stacks multiple self-attention layers on the top of the itemrepresentations generated by GNN to capture long-range dependencies.In addition to leveraging attention mechanism for sequence integration, some works explicitly

incorporate sequential information into the integration process. For instance, NISER [26] andGCE-GNN [126] add the positional embeddings, which reflect the relative order of the items, toeffectively obtain position-aware item representations. To balance the consecutive time and flexibletransition pattern, FGNN [82] employs the GRU with attention mechanism to iteratively updatethe user preference with item representations in the sequence.

4.2 Social network enhancedAnalogous to the social network enhanced general recommendation, an intuitive idea is that users’dynamic preferences inferred from their sequential behaviors can be enhanced by their friends’preferences. However, to the best of our knowledge, there is little attention to utilizing socialrelationship information in the sequential recommendation. The possible reason may be that, therepresentations of users are mainly learned from their sequential behaviors (i.e., sequences of items)in sequential recommendation, especially when their ids are unknown. The dynamic preferencesfrom sequences are much more important than static preferences based on user IDs.

We will briefly introduce one representative work, which adopts GNN to process social informa-tion in sequential recommendation. DGRec [94] first extracts users’ dynamic interests from theirmost recent sequential behaviors with LSTM mechanism. Besides, DGRec introduces a set of latentembeddings for users to reflect their static preferences. The user representations in social networkare initialized by the combination of static preferences and dynamic preferences. Considering thatthe social influence may vary with the relationships between friends, DGRec employs GAT to

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:21

differentiate the influence of friends during diffusion process. The results show that incorporatingsocial information into sequential recommendation can further boost performance.

4.3 Knowledge graph enhancedSimilar to the discussion in Section 3.3, sequential recommendation can also benefit from the richinformation contained in the knowledge graph that the representations of items can be enhanced bytheir semantic connections, especially when the sequence data is insufficient [37–39, 108]. Similarto strategies proposed in the field of knowledge graph enhanced general recommendation, thesequence information and knowledge graph information can be either processed separately orunified into a whole graph.As far as we know, there is only one work applying GNN for knowledge graph enhanced

sequential recommendation. Wang and Cai [107] adopt the GNN framework proposed by SR-GNN [136] to capture the transition patterns, and incorporate knowledge graph information withkey-value memory network. How to take full advantage of the two types of information stilldeserves further studying.

5 OTHER RECOMMENDATION TASKSIn addition to the general and sequential recommendation, there are some recommendation subtasks,such as POI recommendation and group recommendation. Some emerging works have begun toutilize GNN for improving the recommendation performance in these sub-fields. This section willbriefly introduce the application of GNN in other recommendation tasks.

Click-through rate (CTR) prediction is an essential task for recommender systems in large-scale industrial applications, which predicts the click rate based on the multi-type features. One ofthe key challenges of CTR is to model feature interactions. Inspired by the information propagationprocess of GNN, a recent work, Fi-GNN [61] employs GNN to capture the high-order interactionsamong features. Specifically, it constructs a feature graph, where each node corresponds to a featurefield and different fields are connected with each other through edges. Hence, the task of featureinteractions is converted to propagate node information across the graph.

Points-of-interest (POI) recommendation plays a key role in location-based service. Owingto the spatial and temporal characteristics, POI recommendation should model the geographicalinfluence among POIs and the transition patterns from users’ sequential behaviors. In the field ofPOI recommendation, there are several kinds of graph data, such as user-POI bipartite graph, thesequence graph based on check-ins and geographical graph, i.e., the POIs within a certain distanceare connected and the edge weights depend on the distance between POIs [6, 62]. Chang et al. [6]believe that the more often users consecutively visited the two POIs, the greater the geographicalinfluence between these two POIs. Hence, the check-ins not only reflect users’ dynamic preferencesbut also indicate the geographical influence among POIs. Correspondingly, they design the modelnamed GPR, which captures the user preferences and geographical influences from the user-POIgraph and the sequence graph. In addition, to explicitly incorporate the information of geographicaldistribution among POIs, the edge weights in the sequence graph depend on the distance betweenPOIs.

Group recommendation aims to suggest items to a group of users instead of an individualone [34] based on their historical behaviors. There exist three types of relationships: user-item, eachuser interacts with several items; user-group, a group consists of several users; and group-item,a group of users all choose the same item. “Group” can be regarded as a bridge connecting theusers and the items in the group recommendation, which can be either treated as a part of thegraph or not. Here are two representative works corresponding to these two strategies respectively.GAME [34] introduces the “group node” in the graph, and applies the GAT to assign appropriate

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:22 Wu, et al.

Table 4. Graph neural network based models in other recommendation tasks.

Task Model Venue Year

CTR Fi-GNN [61] WWW 2019

POIGPR [6] CIKM 2020STP-UDGAT [62] CIKM 2020

Cross DomainPPGN [157] CIKM 2019BiTGCF [67] CIKM 2020

GroupGAME [34] SIGIR 2020GLS-GRL [117] SIGIR 2020

BundleBGCN [7] SIGIR 2020HFGN [59] SIGIR 2020

weights to each interacted neighbor. With the propagation diffusion, group representation can beiteratively updated with interacted items and users. However, this approach can not be directlyapplied to the task where groups are changed dynamically and new groups are constantly formed.Instead of introducing the “group” entity, GLS-GRL [117] constructs the corresponding graph foreach group specifically, which contains the user nodes, item nodes and their interactions. Thegroup representation is generated by integrating the user representations involved in the group.

Bundle recommendation aims to recommend a set of items as a whole for a user. For grouprecommendation, “group” is made up of users; for bundle recommendation, “group” means aset of items. Analogously, the key challenge is to obtain the bundle representation. There arethree types of relationships: user-item, each user interacts with several items; user-bundle, userschoose the bundles; and bundle-item, a bundle consists of several items. BGCN [7] unifies thethree relationships into one graph and designs the item level and bundle level propagation fromthe users’ perspective. HFGN [59] considers the bundle as the bridge that users interact with theitems through bundles. Correspondingly, it constructs a hierarchical structure upon user-bundleinteractions and bundle-item mappings and further captures the item-item interactions within abundle.

6 FUTURE RESEARCH DIRECTIONS AND OPEN ISSUESWhilst GNN has achieved great success in recommender systems, this section outlines severalpromising prospective research directions.

6.1 GNN for Heterogeneous Graphs in RecommendationHeterogeneity is one of the main characteristics of the graph-structured data in recommendersystems, i.e., many graphs in recommendation contain various types of nodes and links. For instance,the interaction graph consists of user nodes and item nodes, and the knowledge graph has multi-type entities and relations. Due to some unique characteristics (e.g., fusion of more informationand rich semantics) of heterogeneous graphs, directly applying methods for homogeneous graphsto them may lead to suboptimal representations. Heterogeneous graph learning is always a sub-field in graph learning [93], and recent efforts have been devoted to designing GNN methods forheterogeneous graphs. Some works [21, 120] employ meta-paths to guide the propagation process.However, meta-paths in these works need to be designed manually, while automatically generatingthese meta-paths would be more preferable. Considering the neighbor aggregation, one feasiblesolution is hierarchical aggregation strategy [149], i.e., first aggregating the grouped neighbors

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:23

within each type, and then integrating the aggregated representation across different types. Thisstrategy is more appropriate for graphs with the limited number of relations, and exploring a moreflexible and powerful strategy for complex graphs is still an open question.

Some studies on the GNN-based recommendation have also paid attention to the heterogeneityissue. For example, Multi-GCCF [98] and NIA-GCN [97] use different transformation matricescorresponding to the node types. MBGCN [42] deals with multi-behavior recommendation byhierarchical aggregation. Despite these emerging attempts, few works consider the heterogeneityissue for more complex graphs, e.g. knowledge graphs. In addition, the strategies considering hetero-geneity always introduce more computational challenges, and how to improve the computationalefficiency requires further study.

6.2 Diverse and Uncertain RepresentationIn addition to heterogeneity in data types (e.g., node types like user and item, and edge types likedifferent behavior types), users in the graph usually also have diverse and uncertain interests [12, 53].Representing each user as an onefold vector (a point in the low-dimensional vector space) as inprevious works is hard to capture such characteristics in users’ interests. Thus, how to representusers’ multiple and uncertain interests is a direction worth exploring.

A natural choice is to extend such onefold vector to multiple vectors with various methods [70,71, 127], e.g., disentangled representation learning [77, 78] or capsule networks [56]. Some workson GNN-based recommendation also have begun to represent users with multiple vectors. Forinstance, DGCF [121] explicitly adds orthogonal constraints for multi-aspect representations anditeratively updates the adjacent relationships between the linked nodes for each aspect respectively.The research of multiple vector representation for recommendation, especially for GNN-basedrecommendation model, is still in the preliminary stage, and many issues need to be studied inthe future, e.g., how to disentangle the embedding pertinent to users’ intents; how to set thedifferent interest number for each user in an adaptive way; how to design an efficient and effectivepropagation schema for multiple vector representations.

Another feasible solution is to represent each user as a density instead of a vector. Representingdata as a density (usually a multi-dimensional Gaussian distribution) provides many advantages, e.g.,better encoding uncertainty for a representation and its relationships, and expressing asymmetriesmore naturally than dot product, cosine similarity, or euclidean distance. Specifically, Gaussianembedding has been widely used to model the data uncertainty in various domains, e.g., wordembedding [106], document embedding [24, 79], and network/graph embedding [5, 30, 124]. Forrecommendation, Dos Santos et al. [15] and Jiang et al. [41] also deploy Gaussian embedding tocapture users’ uncertain preferences for improving user representations and recommendationperformance. Density-based representation, e.g., Gaussian embedding, is an interesting directionthat is worth exploring but has not been well studied in the GNN-based recommendation models.

6.3 Scalability of GNN in RecommendationIn industrial recommendation scenarios where the datasets include billions of nodes and edges whileeach node contains millions of features, it is challenging to directly apply the traditional GNN dueto the large memory usage and long training time. To deal with the large-scale graphs, there are twomainstreams: one is to reduce the size of the graph by sampling to make existing GNN applicable;another is to design a scalable and efficient GNN by changing the model architecture. Samplingis a natural and widely adopted strategy for training large graphs. For example, Graphsage [28]randomly samples a fixed number of neighbors, and PinSage [145] employs the random walkstrategy for sampling. Besides, some works [18, 91] reconstruct the small-scale subgraph fromthe original graph for each user-item pair. However, sampling will lose more or less part of the

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:24 Wu, et al.

information, and few studies focus on how to design an effective sampling strategy to balance theeffectiveness and scalability.Another mainstream to solve this problem is to decouple the operations of nonlinearities and

collapsing weight matrices between consecutive layers [20, 31, 129]. As the neighbor-averagedfeatures need to be precomputed only once, they are more scalable without the communicationcost in the model training. However, these models are limited by their choice of aggregators andupdaters, as compared to traditional GNN with higher flexibility in learning [9]. Therefore, morefuture works should be studied in face of the large-scale graphs.

6.4 Dynamic Graphs in RecommendationIn real-world recommender systems, not only the objects such as users and items, but also therelationships between them are changing over time. To maintain the up-to-date recommendation,the systems should be iteratively updated with the new coming information. From the perspectiveof graphs, the constantly updated information brings about dynamic graphs instead of static ones.Static graphs are stable so they can be modeled feasibly, while dynamic graphs introduce changingstructures. An interesting prospective research problem is how to design the corresponding GNNframework in response to the dynamic graphs in practice. Existing studies in recommendation paylittle attention to the dynamic graphs. As far as we know, GraphSAIL [144] is the first attempt toaddress the incremental learning on GNN for recommender systems, which deals with the changingof interactions, i.e., the edges between nodes. To balance the update and preservation, it constraintsthe embedding similarity between the central node and its neighborhood in successively learnedmodels and controls the incrementally learned embedding close to its previous version. Dynamicgraphs in recommendation is a largely under-explored area, which deserves further studying.

6.5 Graph Adversarial Learning in RecommendationRecent studies show that GNN can be easily fooled by small perturbation on the input [142], i.e.,the performance of GNN will be greatly reduced if the graph structure contains noise. In real-world recommendation scenarios, it is a common phenomenon that the relationships betweennodes are not always reliable. For instance, users may accidentally click the items, and part ofsocial relationships cannot be captured. In addition, the attacker may also inject fake data into therecommender systems. Due to the vulnerability of GNN to noisy data, there are emerging efforts ongraph adversarial learning in the field of GNN [32, 142, 154, 163]. However, few existing works inGNN-based recommendation pay attention to adversarial learning, which should be an interestingand helpful direction for a more robust recommender system.

6.6 Reception Field of GNN in RecommendationThe reception field of a node refers to a set of nodes including the node itself and its neighborsreachable within𝐾-hops [85], where𝐾 is the number of propagation iterations. By stacking multipleGNN layers, the reception field of nodes with high degree will expand too big and may introducenoise, which could lead to the over-smoothing problem [58] and a consequent drop in performance.For nodes with low degree, they need deep GNN architecture to enlarge their reception field forsufficient neighborhood information.For the graph data in recommendation, the degree of nodes exhibits a long tail distribution,

i.e., active users have lots of interactions with items while cold users have few interactions, andsimilar to the popular items and cold items. Therefore, applying the same propagation depth onall the nodes may be suboptimal. There are only a few emerging works to adaptively decide thepropagation depth for each node in order to obtain a reasonable reception field [46, 66, 69]. As

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:25

a result, how to adaptively select a suitable reception field for each user or item in GNN-basedrecommendation is still an issue worth of research.

7 CONCLUSIONOwing to the superiority of GNN in learning on graph data and its efficacy in capturing collaborativesignals and sequential patterns, utilizing GNN techniques in recommender systems has gainincreasing interests in academia and industry. In this survey, we provide a comprehensive review ofthe most recent works on GNN-based recommender systems. We proposed a classification schemefor organizing existing works. For each category, we briefly clarify the main issues, and detail thecorresponding strategies adopted by the representative models. We also discuss the advantagesand limitations of the existing strategies. Furthermore, we suggest several promising directions forfuture researches. We hope this survey can provide readers with a general understanding of therecent progress in this field, and shed some light on future developments.

REFERENCES[1] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the Next Generation of Recommender Systems: A

Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. on Knowl. and Data Eng. 17, 6 (June 2005), 734–749.https://doi.org/10.1109/TKDE.2005.99

[2] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray kavukcuoglu. 2016. InteractionNetworks for Learning about Objects, Relations and Physics. In Proceedings of the 30th International Conference onNeural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., Red Hook, NY, USA,4509–4517.

[3] Robert M. Bell and Yehuda Koren. 2007. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpo-lation Weights. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining (ICDM ’07). IEEEComputer Society, USA, 43–52. https://doi.org/10.1109/ICDM.2007.90

[4] James Bennett and Stan Lanning. 2007. The Netflix Prize. In Proceedings of the KDD Cup Workshop 2007. ACM, 3–6.[5] Aleksandar Bojchevski and Stephan Günnemann. 2018. Deep Gaussian Embedding of Graphs: Unsupervised Inductive

Learning via Ranking. In International Conference on Learning Representations. https://openreview.net/forum?id=r1ZdKJ-0W

[6] Buru Chang, Gwanghoon Jang, Seoyoon Kim, and Jaewoo Kang. 2020. Learning Graph-Based Geographical LatentRepresentation for Point-of-Interest Recommendation. In Proceedings of the 29th ACM International Conference onInformation & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery,New York, NY, USA, 135–144. https://doi.org/10.1145/3340531.3411905

[7] Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. Bundle Recommendation with GraphConvolutional Networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Developmentin Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY,USA, 1673–1676. https://doi.org/10.1145/3397271.3401198

[8] Bo Chen, Wei Guo, Ruiming Tang, Xin Xin, Yue Ding, Xiuqiang He, and Dong Wang. 2020. TGCN: Tag GraphConvolutional Network for Tag-Aware Recommendation. In Proceedings of the 29th ACM International Conference onInformation & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery,New York, NY, USA, 155–164. https://doi.org/10.1145/3340531.3411927

[9] Lei Chen, Zhengdao Chen, and Joan Bruna. 2020. On Graph Neural Networks versus Graph-Augmented MLPs. arXivpreprint arXiv:2010.15116 (2020).

[10] Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting Graph Based Collaborative Filtering:A Linear Residual Graph Convolutional Network Approach. In The Thirty-Fourth AAAI Conference on ArtificialIntelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, TheTenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February7-12, 2020. AAAI Press, 27–34. https://aaai.org/ojs/index.php/AAAI/article/view/5330

[11] Tianwen Chen and Raymond Chi-Wing Wong. 2020. Handling Information Loss of Graph Neural Networks forSession-Based Recommendation. Association for Computing Machinery, New York, NY, USA, 1172–1180. https://doi.org/10.1145/3394486.3403170

[12] Wanyu Chen, Pengjie Ren, Fei Cai, Fei Sun, and Maarten de Rijke. 2020. Improving End-to-End Sequential Recom-mendations with Intent-Aware Diversification. In Proceedings of the 29th ACM International Conference on Information& Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:26 Wu, et al.

USA, 175–184. https://doi.org/10.1145/3340531.3411897[13] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In

Proceedings of the 10th ACMConference on Recommender Systems (Boston,Massachusetts, USA) (RecSys ’16). Associationfor Computing Machinery, New York, NY, USA, 191–198. https://doi.org/10.1145/2959100.2959190

[14] Amine Dadoun, Raphaël Troncy, Olivier Ratier, and Riccardo Petitti. 2019. Location Embeddings for Next TripRecommendation. In Companion Proceedings of The 2019 World Wide Web Conference (San Francisco, USA) (WWW’19). Association for Computing Machinery, New York, NY, USA, 896–903. https://doi.org/10.1145/3308560.3316535

[15] Ludovic Dos Santos, Benjamin Piwowarski, and Patrick Gallinari. 2017. Gaussian Embeddings for CollaborativeFiltering. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in InformationRetrieval (Shinjuku, Tokyo, Japan) (SIGIR ’17). Association for Computing Machinery, New York, NY, USA, 1065–1068.https://doi.org/10.1145/3077136.3080722

[16] Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and JureLeskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedingsof the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web ConferencesSteering Committee, Republic and Canton of Geneva, CHE, 1775–1784. https://doi.org/10.1145/3178876.3186183

[17] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks forSocial Recommendation. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association forComputing Machinery, New York, NY, USA, 417–426. https://doi.org/10.1145/3308558.3313488

[18] Yufei Feng, Binbin Hu, Fuyu Lv, Qingwen Liu, Zhiqiang Zhang, and Wenwu Ou. 2020. ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation. In Proceedings of the 43rd International ACMSIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Associationfor Computing Machinery, New York, NY, USA, 2231–2240. https://doi.org/10.1145/3397271.3401428

[19] Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. 2017. Protein Interface Prediction Using Graph Convolu-tional Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (LongBeach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6533–6542.

[20] Fabrizio Frasca, Emanuele Rossi, Davide Eynard, Benjamin Chamberlain, Michael Bronstein, and Federico Monti.2020. SIGN: Scalable Inception Graph Neural Networks. In ICML 2020 Workshop on Graph Representation Learningand Beyond.

[21] Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. 2020. MAGNN: Metapath Aggregated Graph Neural Network forHeterogeneous Graph Embedding. In Proceedings of TheWeb Conference 2020 (Taipei, Taiwan) (WWW ’20). Associationfor Computing Machinery, New York, NY, USA, 2331–2341. https://doi.org/10.1145/3366423.3380297

[22] Jibing Gong, Shen Wang, Jinlong Wang, Wenzheng Feng, Hao Peng, Jie Tang, and Philip S. Yu. 2020. AttentionalGraph Convolutional Networks for Knowledge Concept Recommendation in MOOCs in a Heterogeneous View. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 79–88. https://doi.org/10.1145/3397271.3401057

[23] Marco Gori, Augusto Pucci, V Roma, and I Siena. 2007. Itemrank: A random-walk based scoring algorithm forrecommender engines.. In IJCAI, Vol. 7. 2766–2771.

[24] Antoine Gourru, Julien Velcin, and Julien Jacques. 2020. Gaussian Embedding of Linked Documents from a PretrainedSemantic Space. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, ChristianBessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3912–3918. https://doi.org/10.24963/ijcai.2020/541

[25] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. 2015. TrustSVD: Collaborative Filtering with Both the Explicit andImplicit Influence of User Trust and of Item Ratings. In Proceedings of the Twenty-Ninth AAAI Conference on ArtificialIntelligence (Austin, Texas) (AAAI’15). AAAI Press, 123–129.

[26] Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam M. Shroff. 2019. NISER: NormalizedItem and Session Representations with Graph Neural Networks. CoRR abs/1909.04276 (2019). arXiv:1909.04276http://arxiv.org/abs/1909.04276

[27] Takuo Hamaguchi, Hidekazu Oiwa, Masashi Shimbo, and Yuji Matsumoto. 2017. Knowledge Transfer for Out-of-Knowledge-Base Entities: A Graph Neural Network Approach. In Proceedings of the 26th International Joint Conferenceon Artificial Intelligence (Melbourne, Australia) (IJCAI’17). AAAI Press, 1802–1808.

[28] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California,USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.

[29] Ruining He and Julian J. McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse SequentialRecommendation. In IEEE 16th International Conference on Data Mining, ICDM 2016, December 12-15, 2016, Barcelona,Spain, Francesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Zhi-Hua Zhou, and Xindong Wu (Eds.). IEEE

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:27

Computer Society, 191–200. https://doi.org/10.1109/ICDM.2016.0030[30] Shizhu He, Kang Liu, Guoliang Ji, and Jun Zhao. 2015. Learning to Represent Knowledge Graphs with Gaussian

Embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management(Melbourne, Australia) (CIKM ’15). Association for Computing Machinery, New York, NY, USA, 623–632. https://doi.org/10.1145/2806416.2806502

[31] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifyingand Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association forComputing Machinery, New York, NY, USA, 639–648. https://doi.org/10.1145/3397271.3401063

[32] Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial Personalized Ranking for Rec-ommendation. In The 41st International ACM SIGIR Conference on Research & Development in Information Re-trieval (Ann Arbor, MI, USA) (SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 355–364.https://doi.org/10.1145/3209978.3209981

[33] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural CollaborativeFiltering. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17).International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 173–182.https://doi.org/10.1145/3038912.3052569

[34] Zhixiang He, Chi-Yin Chow, and Jia-Dong Zhang. 2020. GAME: Learning Graphical and Attentive Multi-ViewEmbeddings for Occasional Group Recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for ComputingMachinery, New York, NY, USA, 649–658. https://doi.org/10.1145/3397271.3401064

[35] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. 1999. An Algorithmic Framework forPerforming Collaborative Filtering. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval (Berkeley, California, USA) (SIGIR ’99). Association for ComputingMachinery,New York, NY, USA, 230–237. https://doi.org/10.1145/312624.312682

[36] Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-BasedRecommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management(Torino, Italy) (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 843–852. https://doi.org/10.1145/3269206.3271761

[37] Jin Huang, Zhaochun Ren, Wayne Xin Zhao, Gaole He, Ji-Rong Wen, and Daxiang Dong. 2019. Taxonomy-AwareMulti-Hop Reasoning Networks for Sequential Recommendation. In Proceedings of the Twelfth ACM InternationalConference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). Association for ComputingMachinery, New York, NY, USA, 573–581. https://doi.org/10.1145/3289600.3290972

[38] Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y. Chang. 2018. Improving SequentialRecommendation with Knowledge-Enhanced Memory Networks. In The 41st International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). Association for ComputingMachinery, New York, NY, USA, 505–514. https://doi.org/10.1145/3209978.3210017

[39] Xiaowen Huang, Quan Fang, Shengsheng Qian, Jitao Sang, Yan Li, and Changsheng Xu. 2019. Explainable Interaction-Driven User Modeling over Knowledge Graph for Sequential Recommendation. In Proceedings of the 27th ACMInternational Conference on Multimedia (Nice, France) (MM ’19). Association for Computing Machinery, New York,NY, USA, 548–556. https://doi.org/10.1145/3343031.3350893

[40] Mohsen Jamali and Martin Ester. 2010. A Matrix Factorization Technique with Trust Propagation for Recommendationin Social Networks. In Proceedings of the Fourth ACM Conference on Recommender Systems (Barcelona, Spain) (RecSys’10). Association for Computing Machinery, New York, NY, USA, 135–142. https://doi.org/10.1145/1864708.1864736

[41] Junyang Jiang, Deqing Yang, Yanghua Xiao, and Chenlu Shen. 2019. Convolutional Gaussian Embeddings forPersonalized Recommendation with Uncertainty. In Proceedings of the Twenty-Eighth International Joint Conferenceon Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 2642–2648.https://doi.org/10.24963/ijcai.2019/367

[42] Bowen Jin, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. Multi-Behavior Recommendation with GraphConvolutional Networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Developmentin Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY,USA, 659–668. https://doi.org/10.1145/3397271.3401072

[43] Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored Item Similarity Models for Top-N RecommenderSystems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Chicago, Illinois, USA) (KDD ’13). Association for Computing Machinery, New York, NY, USA, 659–667. https://doi.org/10.1145/2487575.2487589

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:28 Wu, et al.

[44] Wang-Cheng Kang, Chen Fang, ZhaowenWang, and Julian J. McAuley. 2017. Visually-Aware Fashion Recommendationand Design with Generative Image Models. In 2017 IEEE International Conference on Data Mining (ICDM). 207–216.

[45] Wang-Cheng Kang and Julian J. McAuley. 2018. Self-Attentive Sequential Recommendation. In IEEE InternationalConference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018. IEEE Computer Society, 197–206. https://doi.org/10.1109/ICDM.2018.00035

[46] Anees Kazi, Shayan Shekarforoush, S Arvind Krishna, Hendrik Burwinkel, Gerome Vivar, Karsten Kortüm, Seyed-Ahmad Ahmadi, Shadi Albarqouni, and Nassir Navab. 2019. InceptionGCN: receptive field aware graph convolutionalnetwork for disease prediction. In International Conference on Information Processing in Medical Imaging. Springer,73–85.

[47] Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorizationfor Document Context-Aware Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA) (RecSys ’16). Association for Computing Machinery, New York, NY, USA, 233–240.https://doi.org/10.1145/2959100.2959165

[48] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. InProceedings of the 5th International Conference on Learning Representations. 2873–2879.

[49] Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. InProceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas,Nevada, USA) (KDD ’08). Association for Computing Machinery, New York, NY, USA, 426–434. https://doi.org/10.1145/1401890.1401944

[50] Yehuda Koren and Robert Bell. 2011. Advances in Collaborative Filtering. Springer US, Boston, MA, 145–186.https://doi.org/10.1007/978-0-387-85820-3_5

[51] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems.Computer 42, 8 (Aug. 2009), 30–37. https://doi.org/10.1109/MC.2009.263

[52] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems.Computer 42, 8 (Aug. 2009), 30–37. https://doi.org/10.1109/MC.2009.263

[53] Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems – A survey. Knowledge-Based Systems123 (2017), 154–162. https://doi.org/10.1016/j.knosys.2017.02.009

[54] Daniel D. Lee and H. Sebastian Seung. 2000. Algorithms for Non-Negative Matrix Factorization. In Proceedings of the13th International Conference on Neural Information Processing Systems (Denver, CO) (NIPS’00). MIT Press, Cambridge,MA, USA, 535–541.

[55] Chong Li, Kunyang Jia, Dan Shen, CJ Shi, and Hongxia Yang. 2019. Hierarchical representation learning for bipartitegraphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2873–2879.

[56] Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, andDik Lun Lee. 2019. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. In Proceedingsof the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19).Association for Computing Machinery, New York, NY, USA, 2615–2623. https://doi.org/10.1145/3357384.3357814

[57] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-BasedRecommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore,Singapore) (CIKM ’17). Association for Computing Machinery, New York, NY, USA, 1419–1428. https://doi.org/10.1145/3132847.3132926

[58] Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. arXiv preprint arXiv:1801.07606 (2018).

[59] Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng Chua. 2020. Hierarchical Fashion GraphNetwork for Personalized Outfit Recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for ComputingMachinery, New York, NY, USA, 159–168. https://doi.org/10.1145/3397271.3401080

[60] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXivpreprint arXiv:1511.05493 (2015).

[61] Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Fi-GNN: Modeling Feature Interactions viaGraph Neural Networks for CTR Prediction. In Proceedings of the 28th ACM International Conference on Informationand Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA,539–548. https://doi.org/10.1145/3357384.3357951

[62] Nicholas Lim, Bryan Hooi, See-Kiong Ng, Xueou Wang, Yong Liang Goh, Renrong Weng, and Jagannadan Varadara-jan. 2020. STP-UDGAT: Spatial-Temporal-Preference User Dimensional Graph Attention Network for Next POIRecommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Manage-ment (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 845–854.https://doi.org/10.1145/3340531.3411876

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:29

[63] Hailun Lin, Yong Liu, Weiping Wang, Yinliang Yue, and Zheng Lin. 2017. Learning entity and relation embeddingsfor knowledge resolution. Procedia Computer Science 108 (2017), 345–354.

[64] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.Com Recommendations: Item-to-Item CollaborativeFiltering. IEEE Internet Computing 7, 1 (Jan. 2003), 76–80. https://doi.org/10.1109/MIC.2003.1167344

[65] Feng Liu, Qing Liu, Wei Guo, Huifeng Guo, Weiwen Liu, Ruiming Tang, Xutao Li, Yunming Ye, and Xiuqiang He. 2020.Inter-sequence Enhanced Framework for Personalized Sequential Recommendation. arXiv preprint arXiv:2004.12118(2020).

[66] Meng Liu, Hongyang Gao, and Shuiwang Ji. 2020. Towards deeper graph neural networks. In Proceedings of the 26thACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 338–348.

[67] Meng Liu, Jianjun Li, Guohui Li, and Peng Pan. 2020. Cross Domain Recommendation via Bi-Directional TransferGraph Collaborative Filtering Networks. In Proceedings of the 29th ACM International Conference on Information &Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY,USA, 885–894. https://doi.org/10.1145/3340531.3412012

[68] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: Short-Term Attention/Memory Priority Modelfor Session-Based Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York,NY, USA, 1831–1839. https://doi.org/10.1145/3219819.3219950

[69] Ziqi Liu, Chaochao Chen, Longfei Li, Jun Zhou, Xiaolong Li, Le Song, and Yuan Qi. 2019. Geniepath: Graph neuralnetworks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.4424–4431.

[70] Zhaoyang Liu, Haokun Chen, Fei Sun, Xu Xie, Jinyang Gao, Bolin Ding, and Yanyan Shen. 2020. Intent PreferenceDecoupling for User Representation on Online Recommender System. In Proceedings of the Twenty-Ninth InternationalJoint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences onArtificial Intelligence Organization, 2575–2582. Main track.

[71] Zheng Liu, Jianxun Lian, Junhan Yang, Defu Lian, and Xing Xie. 2020. Octopus: Comprehensive and Elastic UserRepresentation for the Generation of Recommendation Candidates. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association forComputing Machinery, New York, NY, USA, 289–298. https://doi.org/10.1145/3397271.3401088

[72] Zhiwei Liu, Lin Meng, Jiawei Zhang, and Philip S Yu. 2020. Deoscillated Graph Collaborative Filtering. arXiv preprintarXiv:2011.02100 (2020).

[73] Chen Ma, Liheng Ma, Yingxue Zhang, Jianing Sun, Xue Liu, and Mark Coates. 2020. Memory Augmented GraphNeural Networks for Sequential Recommendation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence,AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAISymposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAIPress, 5045–5052. https://aaai.org/ojs/index.php/AAAI/article/view/5945

[74] Hao Ma, Irwin King, and Michael R. Lyu. 2009. Learning to Recommend with Social Trust Ensemble. In Proceedings ofthe 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA)(SIGIR ’09). Association for Computing Machinery, New York, NY, USA, 203–210. https://doi.org/10.1145/1571941.1571978

[75] Hao Ma, Haixuan Yang, Michael R. Lyu, and Irwin King. 2008. SoRec: Social Recommendation Using ProbabilisticMatrix Factorization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (NapaValley, California, USA) (CIKM ’08). Association for Computing Machinery, New York, NY, USA, 931–940. https://doi.org/10.1145/1458082.1458205

[76] Hao Ma, Dengyong Zhou, Chao Liu, Michael R. Lyu, and Irwin King. 2011. Recommender Systems with SocialRegularization. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (HongKong, China) (WSDM ’11). Association for Computing Machinery, New York, NY, USA, 287–296. https://doi.org/10.1145/1935826.1935877

[77] Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learning Disentangled Representations forRecommendation. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer,F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/a2186aa7c086b46ad4e8bf81e2a3a19b-Paper.pdf

[78] Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled Self-Supervisionin Sequential Recommenders. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA,483–491. https://doi.org/10.1145/3394486.3403091

[79] Giannis Nikolentzos, Polykarpos Meladianos, François Rousseau, Yannis Stavrakas, and Michalis Vazirgiannis. 2017.Multivariate Gaussian Document Representation from Word Embeddings for Text Categorization. In Proceedings of

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:30 Wu, et al.

the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers.Association for Computational Linguistics, Valencia, Spain, 450–455. https://www.aclweb.org/anthology/E17-2072

[80] Zhiqiang Pan, Fei Cai, Wanyu Chen, Honghui Chen, and Maarten de Rijke. 2020. Star Graph Neural Networksfor Session-Based Recommendation. In Proceedings of the 29th ACM International Conference on Information &Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY,USA, 1195–1204. https://doi.org/10.1145/3340531.3412014

[81] Andreas Pfadler, Huan Zhao, Jizhe Wang, Lifeng Wang, PPipei Huang, and Dik Lun Lee. 2020. Billion-scale Rec-ommendation with Heterogeneous Side Information at Taobao. In 2020 IEEE 36th International Conference on DataEngineering (ICDE). 1667–1676. https://doi.org/10.1109/ICDE48307.2020.00148

[82] Ruihong Qiu, Jingjing Li, Zi Huang, and Hongzhi YIn. 2019. Rethinking the Item Order in Session-Based Recom-mendation with Graph Neural Networks. In Proceedings of the 28th ACM International Conference on Information andKnowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA,579–588. https://doi.org/10.1145/3357384.3358010

[83] Ruihong Qiu, Hongzhi Yin, Zi Huang, and Tong Chen. 2020. GAG: Global Attributed Graph Neural Network forStreaming Session-Based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Researchand Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery,New York, NY, USA, 669–678. https://doi.org/10.1145/3397271.3401109

[84] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. ACMComput. Surv. 51, 4, Article 66 (July 2018), 36 pages. https://doi.org/10.1145/3190616

[85] Pei Quan, Yong Shi, Minglong Lei, Jiaxu Leng, Tianlin Zhang, and Lingfeng Niu. 2019. A Brief Review of ReceptiveFields in Graph Convolutional Networks. In IEEE/WIC/ACM International Conference on Web Intelligence-CompanionVolume. 106–110.

[86] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing Personalized Markov Chainsfor Next-Basket Recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh,North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 811–820. https://doi.org/10.1145/1772690.1772773

[87] Ruslan Salakhutdinov and AndriyMnih. 2007. Probabilistic Matrix Factorization. In Proceedings of the 20th InternationalConference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS’07). CurranAssociates Inc., Red Hook, NY, USA, 1257–1264.

[88] Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, andPeter Battaglia. 2018. Graph Networks as Learnable Physics Engines for Inference and Control. In Proceedings ofthe 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy andAndreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 4470–4479. http://proceedings.mlr.press/v80/sanchez-gonzalez18a.html

[89] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Collaborative Filtering Recommen-dation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (Hong Kong, Hong Kong)(WWW ’01). Association for Computing Machinery, New York, NY, USA, 285–295. https://doi.org/10.1145/371920.372071

[90] Suvash Sedhain, Aditya KrishnaMenon, Scott Sanner, and Lexing Xie. 2015. AutoRec: AutoencodersMeet CollaborativeFiltering. In Proceedings of the 24th International Conference onWorldWideWeb (Florence, Italy) (WWW ’15 Companion).Association for Computing Machinery, New York, NY, USA, 111–112. https://doi.org/10.1145/2740908.2742726

[91] Xiao Sha, Zhu Sun, and Jie Zhang. 2019. Attentive Knowledge Graph Embedding for Personalized Recommendation.arXiv preprint arXiv:1910.08288 (2019).

[92] Chuan Shi, Binbin Hu, Wayne Xin Zhao, and S Yu Philip. 2018. Heterogeneous information network embedding forrecommendation. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2018), 357–370.

[93] Chuan Shi and S Yu Philip. 2017. Heterogeneous information network analysis and applications. Springer.[94] Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-based social

recommendation via dynamic graph attention networks. In Proceedings of the Twelfth ACM International Conferenceon Web Search and Data Mining. 555–563.

[95] Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2020. Adaptive propagation graph convolutional network.IEEE Transactions on Neural Networks and Learning Systems (2020).

[96] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: SequentialRecommendation with Bidirectional Encoder Representations from Transformer. In Proceedings of the 28th ACMInternational Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association forComputing Machinery, New York, NY, USA, 1441–1450. https://doi.org/10.1145/3357384.3357895

[97] Jianing Sun, Yingxue Zhang, Wei Guo, Huifeng Guo, Ruiming Tang, Xiuqiang He, Chen Ma, and Mark Coates.2020. Neighbor Interaction Aware Graph Convolution Networks for Recommendation. In Proceedings of the 43rd

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:31

International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR’20). Association for Computing Machinery, New York, NY, USA, 1289–1298. https://doi.org/10.1145/3397271.3401123

[98] Jianing Sun, Yingxue Zhang, Chen Ma, Mark Coates, Huifeng Guo, Ruiming Tang, and Xiuqiang He. 2019. Multi-graph Convolution Collaborative Filtering. In 2019 IEEE International Conference on Data Mining (ICDM). 1306–1311.https://doi.org/10.1109/ICDM.2019.00165

[99] Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020.Multi-Modal Knowledge Graphs for Recommender Systems. In Proceedings of the 29th ACM International Conferenceon Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery,New York, NY, USA, 1405–1414. https://doi.org/10.1145/3340531.3411947

[100] Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent Knowledge GraphEmbedding for Effective Recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems(Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA,297–305. https://doi.org/10.1145/3240323.3240361

[101] Qiaoyu Tan, Ninghao Liu, Xing Zhao, Hongxia Yang, Jingren Zhou, and Xia Hu. 2020. Learning to Hash with GraphNeural Networks for Recommender Systems. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20).Association for Computing Machinery, New York, NY, USA, 1988–1998. https://doi.org/10.1145/3366423.3380266

[102] Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved Recurrent Neural Networks for Session-Based Recom-mendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (Boston, MA, USA) (DLRS2016). Association for Computing Machinery, New York, NY, USA, 17–22. https://doi.org/10.1145/2988450.2988452

[103] Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. 2013. Exploiting Local and Global Social Context for Recommendation.In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (Beijing, China) (IJCAI ’13).AAAI Press, 2712–2718.

[104] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2018. Graph Convolutional Matrix Completion. (2018).[105] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph

attention networks. arXiv preprint arXiv:1710.10903 (2017).[106] Luke Vilnis and Andrew McCallum. 2015. Word Representations via Gaussian Embedding. In 3rd International

Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6623[107] Baocheng Wang and Wentao Cai. 2020. Knowledge-enhanced graph neural networks for sequential recommendation.

Information 11, 8 (2020), 388.[108] Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2020. Make It a Chorus: Knowledge- and

Time-Aware Item Modeling for Sequential Recommendation. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association forComputing Machinery, New York, NY, USA, 109–118. https://doi.org/10.1145/3397271.3401131

[109] Haoyu Wang, Defu Lian, and Yong Ge. 2019. Binarized collaborative filtering with distilling graph convolutionalnetworks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 4802–4808.

[110] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney,NSW, Australia) (KDD ’15). Association for Computing Machinery, New York, NY, USA, 1235–1244. https://doi.org/10.1145/2783258.2783273

[111] Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018. RippleNet:Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Proceedings of the 27th ACMInternational Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). Association forComputing Machinery, New York, NY, USA, 417–426. https://doi.org/10.1145/3269206.3271739

[112] Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, and Zhongyuan Wang. 2019.Knowledge-Aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage,AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 968–977. https://doi.org/10.1145/3292500.3330836

[113] HongweiWang, Fuzheng Zhang,Miao Zhao,Wenjie Li, Xing Xie, andMinyi Guo. 2019. Multi-Task Feature Learning forKnowledge Graph Enhanced Recommendation. In The World Wide Web Conference (San Francisco, CA, USA) (WWW’19). Association for Computing Machinery, New York, NY, USA, 2000–2010. https://doi.org/10.1145/3308558.3313411

[114] Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge Graph Convolutional Networksfor Recommender Systems. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association forComputing Machinery, New York, NY, USA, 3307–3313. https://doi.org/10.1145/3308558.3313417

[115] Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. 2017. What Your Images Reveal:Exploiting Visual Contents for Point-of-Interest Recommendation. In Proceedings of the 26th International Conferenceon World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:32 Wu, et al.

Republic and Canton of Geneva, CHE, 391–400. https://doi.org/10.1145/3038912.3052638[116] Wen Wang, Wei Zhang, Shukai Liu, Qi Liu, Bo Zhang, Leyu Lin, and Hongyuan Zha. 2020. Beyond Clicks: Modeling

Multi-Relational Item Graph for Session-Based Target Behavior Prediction. In Proceedings of The Web Conference2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 3056–3062. https://doi.org/10.1145/3366423.3380077

[117] Wen Wang, Wei Zhang, Jun Rao, Zhijie Qiu, Bo Zhang, Leyu Lin, and Hongyuan Zha. 2020. Group-Aware Long-and Short-Term Graph Representation Learning for Sequential Group Recommendation. In Proceedings of the 43rdInternational ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR’20). Association for Computing Machinery, New York, NY, USA, 1449–1458. https://doi.org/10.1145/3397271.3401136

[118] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph AttentionNetwork for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY,USA, 950–958. https://doi.org/10.1145/3292500.3330989

[119] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering.In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(Paris, France) (SIGIR’19). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/3331184.3331267

[120] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous GraphAttention Network. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association forComputing Machinery, New York, NY, USA, 2022–2032. https://doi.org/10.1145/3308558.3313562

[121] Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua. 2020. Disentangled GraphCollaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development inInformation Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY, USA,1001–1010. https://doi.org/10.1145/3397271.3401137

[122] Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable reasoningover knowledge graphs for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.5329–5336.

[123] Xiao Wang, Ruijia Wang, Chuan Shi, Guojie Song, and Qingyong Li. 2019. Multi-Component Graph ConvolutionalCollaborative Filtering. arXiv preprint arXiv:1911.10699 (2019).

[124] Yun Wang, Lun Du, Guojie Song, Xiaojun Ma, Lichen Jin, Wei Lin, and Fei Sun. 2019. Tag2Gauss: Learning TagRepresentations via Gaussian Distribution in Tagged Networks. In Proceedings of the Twenty-Eighth International JointConference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization,3799–3805. https://doi.org/10.24963/ijcai.2019/527

[125] Yifan Wang, Suyao Tang, Yuntong Lei, Weiping Song, Sheng Wang, and Ming Zhang. 2020. DisenHAN: DisentangledHeterogeneous GraphAttentionNetwork for Recommendation. In Proceedings of the 29th ACM International Conferenceon Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery,New York, NY, USA, 1605–1614. https://doi.org/10.1145/3340531.3411996

[126] Ziyang Wang, Wei Wei, Gao Cong, Xiao-Li Li, Xian-Ling Mao, and Minghui Qiu. 2020. Global Context EnhancedGraph Neural Networks for Session-Based Recommendation. In Proceedings of the 43rd International ACM SIGIRConference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association forComputing Machinery, New York, NY, USA, 169–178. https://doi.org/10.1145/3397271.3401142

[127] Jason Weston, Ron J. Weiss, and Hector Yee. 2013. Nonlinear Latent Factorization by Embedding Multiple UserInterests. In Proceedings of the 7th ACM Conference on Recommender Systems (Hong Kong, China) (RecSys ’13).Association for Computing Machinery, New York, NY, USA, 65–68. https://doi.org/10.1145/2507157.2507209

[128] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac,Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite,Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020.Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on EmpiricalMethods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online,38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

[129] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graphconvolutional networks. In International conference on machine learning. PMLR, 6861–6871.

[130] Jiancan Wu, Xiangnan He, Xiang Wang, Qifan Wang, Weijian Chen, Jianxun Lian, Xing Xie, and Yongdong Zhang.2020. Graph Convolution Machine for Context-aware Recommender System. arXiv preprint arXiv:2001.11402 (2020).

[131] Le Wu, Junwei Li, Peijie Sun, Richang Hong, Yong Ge, and Meng Wang. 2020. DiffNet++: A Neural Influenceand Interest Diffusion Network for Social Recommendation. CoRR abs/2002.00844 (2020). arXiv:2002.00844 https://arxiv.org/abs/2002.00844

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

Graph Neural Networks in Recommender Systems: A Survey 111:33

[132] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019. A Neural Influence DiffusionModel for Social Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (Paris, France) (SIGIR’19). Association for Computing Machinery, New York, NY,USA, 235–244. https://doi.org/10.1145/3331184.3331214

[133] Le Wu, Yonghui Yang, Lei Chen, Defu Lian, Richang Hong, and Meng Wang. 2020. Learning to Transfer GraphEmbeddings for Inductive Graph Based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for ComputingMachinery, New York, NY, USA, 1211–1220. https://doi.org/10.1145/3397271.3401145

[134] Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, Yanjie Fu, and Meng Wang. 2020. Joint Item Recommendation andAttribute Inference: An Adaptive Graph Convolutional Network Approach. In Proceedings of the 43rd InternationalACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20).Association for Computing Machinery, New York, NY, USA, 679–688. https://doi.org/10.1145/3397271.3401144

[135] Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao, and Guihai Chen. 2019. Dual GraphAttention Networks for Deep Latent Representation of Multifaceted Social Effects in Recommender Systems. In TheWorld Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York,NY, USA, 2091–2102. https://doi.org/10.1145/3308558.3313442

[136] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendationwith graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.

[137] Shu Wu, Mengqi Zhang, Xin Jiang, Xu Ke, and Liang Wang. 2019. Personalizing Graph Neural Networks withAttention Mechanism for Session-based Recommendation. arXiv preprint arXiv:1910.08887 (2019).

[138] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-Encoders forTop-N Recommender Systems. In Proceedings of the Ninth ACM International Conference on Web Search and DataMining (San Francisco, California, USA) (WSDM ’16). Association for Computing Machinery, New York, NY, USA,153–162. https://doi.org/10.1145/2835776.2835837

[139] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensivesurvey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).

[140] Xu Xie, Fei Sun, Xiaoyong Yang, Zhao Yang, Jinyang Gao, Wenwu Ou, and Bin Cui. 2021. Explore User Neighbor-hood for Real-time E-commerce Recommendation. In Proceedings of the 37th IEEE International Conference on DataEngineering.

[141] Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, and XiaofangZhou. 2019. Graph contextualized self-attention network for session-based recommendation. In Proceedings of the28th International Joint Conference on Artificial Intelligence. AAAI Press, 3940–3946.

[142] Han Xu, Yaxin Li, Wei Jin, and Jiliang Tang. 2020. Adversarial Attacks and Defenses: Frontiers, Advances andPractice. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 3541–3542. https://doi.org/10.1145/3394486.3406467

[143] Jixing Xu, Zhenlong Zhu, Jianxin Zhao, Xuanye Liu, Minghui Shan, and Jiecheng Guo. 2020. Gemini: A Novel andUniversal Heterogeneous Graph Information Fusing Framework for Online Recommendations. Association for ComputingMachinery, New York, NY, USA, 3356–3365. https://doi.org/10.1145/3394486.3403388

[144] Yishi Xu, Yingxue Zhang, Wei Guo, Huifeng Guo, Ruiming Tang, and Mark Coates. 2020. GraphSAIL: Graph StructureAware Incremental Learning for Recommender Systems. In Proceedings of the 29th ACM International Conference onInformation & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery,New York, NY, USA, 2861–2868. https://doi.org/10.1145/3340531.3412754

[145] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. GraphConvolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Associationfor Computing Machinery, New York, NY, USA, 974–983. https://doi.org/10.1145/3219819.3219890

[146] Feng Yu, Yanqiao Zhu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2020. TAGNN: Target Attentive GraphNeural Networks for Session-Based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for ComputingMachinery, New York, NY, USA, 1921–1924. https://doi.org/10.1145/3397271.3401319

[147] Junliang Yu, Hongzhi Yin, Jundong Li, Min Gao, Zi Huang, and Lizhen Cui. 2020. Enhance Social Recommendationwith Adversarial Graph Convolutional Networks. arXiv preprint arXiv:2004.02340 (2020).

[148] Xiao Yu, Xiang Ren, Yizhou Sun, Bradley Sturt, Urvashi Khandelwal, Quanquan Gu, Brandon Norick, and JiaweiHan. 2013. Recommendation in Heterogeneous Information Networks with Implicit User Feedback. In Proceedingsof the 7th ACM Conference on Recommender Systems (Hong Kong, China) (RecSys ’13). Association for ComputingMachinery, New York, NY, USA, 347–350. https://doi.org/10.1145/2507157.2507230

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.

111:34 Wu, et al.

[149] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous GraphNeural Network (KDD ’19). Association for Computing Machinery, New York, NY, USA, 793–803. https://doi.org/10.1145/3292500.3330961

[150] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge BaseEmbedding for Recommender Systems. In Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, NewYork, NY, USA, 353–362. https://doi.org/10.1145/2939672.2939673

[151] Jiani Zhang, Xingjian Shi, Shenglin Zhao, and Irwin King. 2019. STAR-GCN: stacked and reconstructed graphconvolutional networks for recommender systems. In Proceedings of the 28th International Joint Conference onArtificial Intelligence. AAAI Press, 4264–4270.

[152] Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based on Graph Neural Networks. In 8th Interna-tional Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.https://openreview.net/forum?id=ByxxgCEYDS

[153] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning Based Recommender System: A Survey and NewPerspectives. ACM Comput. Surv. 52, 1, Article 5 (Feb. 2019), 38 pages. https://doi.org/10.1145/3285029

[154] Shijie Zhang, Hongzhi Yin, Tong Chen, Quoc Viet NguyenHung, Zi Huang, and Lizhen Cui. 2020. GCN-Based User Rep-resentation Learning for Unifying Robust Recommendation and Fraudster Detection. arXiv preprint arXiv:2005.10150(2020).

[155] Yongfeng Zhang, Qingyao Ai, Xu Chen, and Pengfei Wang. 2018. Learning over knowledge-base embeddings forrecommendation. arXiv preprint arXiv:1803.06540 (2018).

[156] Yuan Zhang, Fei Sun, Xiaoyong Yang, Chen Xu, Wenwu Ou, and Yan Zhang. 2020. Graph-Based Regularizationon Embedding Layers for Recommendation. ACM Trans. Inf. Syst. 39, 1, Article 2 (Sept. 2020), 27 pages. https://doi.org/10.1145/3414067

[157] Cheng Zhao, Chenliang Li, and Cong Fu. 2019. Cross-Domain Recommendation via Preference Propagation GraphNet.In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China)(CIKM ’19). Association for Computing Machinery, New York, NY, USA, 2165–2168. https://doi.org/10.1145/3357384.3358166

[158] Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Wei Ning, Guang Qiu, and Xiaofei He. 2019. IntentGC: A ScalableGraph Convolution Framework Fusing Heterogeneous Information for Recommendation. In Proceedings of the 25thACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19).Association for Computing Machinery, New York, NY, USA, 2347–2357. https://doi.org/10.1145/3292500.3330686

[159] Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S. Yu. 2018. Spectral Collaborative Filtering. In Proceedingsof the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Associationfor Computing Machinery, New York, NY, USA, 311–319. https://doi.org/10.1145/3240323.3240343

[160] Yujia Zheng, Siyi Liu, Zekun Li, and Shu Wu. 2020. DGTN: Dual-channel Graph Transition Network for Session-basedRecommendation. arXiv preprint arXiv:2009.10002 (2020).

[161] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun.2018. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434 (2018).

[162] Qi Zhou, Yizhi Ren, TianyuXia, Lifeng Yuan, and Linqiang Chen. 2019. Data PoisoningAttacks onGraph ConvolutionalMatrix Completion. In International Conference on Algorithms and Architectures for Parallel Processing. Springer, 427–439.

[163] Dingyuan Zhu, Ziwei Zhang, Peng Cui, and Wenwu Zhu. 2019. Robust Graph Convolutional Networks AgainstAdversarial Attacks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & DataMining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 1399–1407.https://doi.org/10.1145/3292500.3330851

J. ACM, Vol. 37, No. 4, Article 111. Publication date: April 2021.