Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1
CSE 6240: Web Search and Text Mining. Spring 2020
Graph Neural Networks
Prof. Srijan Kumarhttp://cc.gatech.edu/~srijan
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 2
Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 3
Goal: Node Embeddings
similarity(u, v) ⇡ z>v zuGoal:
Need to define!
Input network d-dimensional embedding space
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 4
Deep Graph Encoders• Encoder: Map a node to a low-dimensional
vector:• Deep encoder methods based on graph
neural networks:
• Graph encoders idea isinspired by CNN on images
enc(v) = zv
enc(v) =multiple layers of
non-linear transformations of graph structure
End-to-end learning on graphs with GCNs Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Image Graph
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 5
Idea from Convolutional Networks• In CNN, pixel representation is created by
transforming neighboring pixel representation
– In GNN, node representations are created by transforming neighboring node representation
• But graphs are irregular, unlike images– So, generalize convolutions beyond simple
lattices, and leverage node features/attributes• Solution: deep graph encoders
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 6
Deep Graph Encoders
…
Output: Node embeddings, embed larger network structures, subgraphs, graphs
• Once an encoder is defined, multiple layers of encoders can be stacked
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 7
Graph Encoder: A Naïve Approach
• Join adjacency matrix and features• Feed them into a deep neural network:
• Issues with this idea:– 𝑂(|𝑉|) parameters– Not applicable to graphs of different sizes– Not invariant to node ordering
End-to-end learning on graphs with GCNs Thomas Kipf
A B C D EABCDE
0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0
Feat
A naïve approach
8
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters • No inductive learning possible
?A
C
B
D
E
[A,X]
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 8
Graph Encoders: Two Instantiations1. Graph convolution networks (GCN): one
of the first frameworks to learn node embeddings in an end-to-end manner
– Different from random walk methods, which are not end-to-end
2. GraphSAGE: generalized GCNs to various neighborhood aggregations
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 9
Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks (GCN)• GraphSAGE
Main paper: “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf and Welling, ICLR 2017
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 10
Content• Local network neighborhoods:
– Describe aggregation strategies– Define computation graphs
• Stacking multiple layers:– Describe the model, parameters, training– How to fit the model?– Simple example for unsupervised and
supervised training
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 11
Setup• Assume we have a graph 𝐺:
– 𝑉 is the vertex set– 𝑨 is the adjacency matrix (assume binary)– 𝑿 ∈ ℝ+×|-| is a matrix of node features
– Social networks: User profile, User image– Biological networks: Gene expression profiles– If there are no features, use:
» Indicator vectors (one-hot encoding of a node)» Vector of constant 1: [1, 1, …, 1]
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 12
Graph Convolutional Networks• Idea: Generate node embeddings based on
local network neighborhoods – A node’s neighborhood defines its computation
graph• Learn how to aggregate information from
the neighborhood to learn node embeddings
– Transform information from the neighbors and combine it:
• Transform “messages” ℎ/ from neighbors: 𝑊/ℎ/
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 13
Idea: Aggregate Neighbors• Intuition: Generate node embeddings based
on local network neighborhoods • Nodes aggregate information from their
neighbors using neural networks
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
Neuralnetworks
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 14
Idea: Aggregate Neighbors• Intuition: Network neighborhood defines a
computation graph
Everynodedefinesacomputationgraphbasedonitsneighborhood
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 15
Deep Model: Many Layers• Model can be of arbitrary depth:
– Nodes have embeddings at each layer– Layer-0 embedding of node 𝒖 is its input feature, 𝒙𝒖– Layer-K embedding gets information from nodes
that are atmost K hops away
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
xA
xB
xC
xExF
xA
xA
Layer-2
Layer-1Layer-0
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 16
Neighborhood Aggregation • Neighborhood aggregation: Key
distinctions are in how different approaches aggregate information across the layers
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
?
?
?
?
What is in the box?
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 17
Neighborhood Aggregation• Basic approach: Average information from
neighbors and apply a neural network
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
(1)averagemessagesfromneighbors
(2)applyneuralnetwork
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 18
The Math: Deep Encoder• Basic approach: Average neighbor
messages and apply a neural network– Note: Apply L2 normalization for each node embedding at
every layer
Averageofneighbor’spreviouslayerembeddings
Initial0-thlayerembeddings areequaltonodefeatures
EmbeddingafterKlayersofneighborhoodaggregation
Non-linearity(e.g.,ReLU)
Previouslayerembeddingofvh
0v = xv
h
kv = �
0
@Wk
X
u2N(v)
h
k�1u
|N(v)| +Bkhk�1v
1
A , 8k 2 {1, ...,K}
zv = h
Kv
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 19
GCN: Matrix Form• H(l) is the representation in lth layer• W0
(l) and W1(l) are matrices to be learned for
each layer• A = adjacency matrix, D = diagonal degree
matrix• GCN rewritten in vector form:
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 20
𝒛5
Training the Model• How do we train the model?
– Need to define a loss function on the embeddings
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 21
Model Parameters
• We can feed these embeddings into any loss function and run stochastic gradient descent to train the weight parameters
– Once we have the weight matrices, we can calculate the node embeddings
Trainableweightmatrices(i.e.,whatwelearn)h
0v = xv
h
kv = �
0
@Wk
X
u2N(v)
h
k�1u
|N(v)| +Bkhk�1v
1
A , 8k 2 {1, ...,K}
zv = h
Kv
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 22
Unsupervised Training• Training can be unsupervised or supervised• Unsupervised training:
– Use only the graph structure: “Similar” nodes have similar embeddings
– Common unsupervised loss function = edge existence
• Unsupervised loss function can be anything from the last section, e.g., a loss based on– Node proximity in the graph– Random walks
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 23
Supervised Training• Train the model for a
supervised task (e.g., node classification)
• Two ways:– Total loss =
supervised loss – Total loss =
supervised loss + unsupervised loss
E.g., Normal or anomalous node?
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 24
Model Design: Overview(1) Define a neighborhood
aggregation function
(2) Define a loss function on the embeddings
𝒛5
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 25
Model Design: Overview
(3) Train on a set of nodes, i.e., a batch of compute graphs
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 26
Model Design: Overview
(4) Generate embeddings for nodes as needed
Even for nodes we never trained on!
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 27
GCN: Inductive Capability• The same aggregation parameters are
shared for all nodes:– The number of model parameters is sublinear in |𝑉| and we can generalize to unseen nodes
INPUT GRAPH
B
DE
F
CA
Compute graph for node A Compute graph for node B
shared parameters
shared parametersWk Bk
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 28
Inductive Capability: New Nodes
Trainwithsnapshot NewnodearrivesGenerateembedding
fornewnode
zu
• Many application settings constantly encounter previously unseen nodes:
• E.g., Reddit, YouTube, Google Scholar• Need to generate new embeddings “on the fly”
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 29
Inductive Capability: New Graphs
Inductive node embedding Generalize to entirely unseen graphs
E.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B
Train on one graph Generalize to new graph
zu
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 30
Summary So Far• Recap: Generate node embeddings by
aggregating neighborhood information– We saw a basic variant of this idea– Key distinctions are in how different approaches
aggregate information across the layers
• Next: GraphSAGE graph neural network architecture
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 31
Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE
• Main paper: Inductive Representation Learning on Large Graphs. William L. Hamilton, Rex Ying, Jure Leskovec. NeurIPS 2017.
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 32
GraphSAGE Idea• In GCN, we aggregated the neighbors’
messages as the (weighted) average of all neighbors. How can we generalize this?
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
?
?
?
?
[Hamilton et al., NIPS 2017]
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 33
GraphSAGE Idea
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
hkv = �
�⇥Ak · agg({hk�1
u , 8u 2 N(v)}),Bkhk�1v
⇤�
Any differentiable function that maps set of vectors in 𝑁(𝑢) to a single vector
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 34
Neighborhood Aggregation• Simple neighborhood aggregation:
• GraphSAGE:Concatenate neighbor embedding
and self embedding
hkv = �
�⇥Wk · agg
�{hk�1
u , 8u 2 N(v)}�,Bkh
k�1v
⇤�
hkv = �
0
@Wk
X
u2N(v)
hk�1u
|N(v)| +Bkhk�1v
1
A
Generalized aggregation
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 35
Neighbor Aggregation: Variants• Mean: Take a weighted average of
neighbors
• Pool: Transform neighbor vectors and apply symmetric vector function
• LSTM: Apply LSTM to reshuffled of neighbors
agg =X
u2N(v)
hk�1u
|N(v)|
Element-wisemean/max
agg = ��{Qhk�1
u , 8u 2 N(v)}�
agg = LSTM�[hk�1
u , 8u 2 ⇡(N(v))]�
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 36
Experiments: Dataset• Dynamic datasets:
– Citation Network: Predict paper category• Data from 2000-2005• 302,424 nodes• Train: data till 2004, test: 2005 data
– Reddit Post Network: Predict subreddit of post• Nodes = posts• Edges between posts if common users comment on
the post• 232,965 posts• Train: 20 days of data, test: next 10 days of data
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 37
Experiments: Results
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 38
Summary: GCN and GraphSAGE• Key idea: Generate node embeddings based
on local neighborhoods – Nodes aggregate “messages” from their neighbors
using neural networks• Graph convolutional networks:
– Basic variant: Average neighborhood information and stack neural networks
• GraphSAGE:– Generalized neighborhood aggregation
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 39
Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE