Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Deep Learning on Graphswith Graph Convolutional Networks
Thomas Kipf, 6 April 2017
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
joint work with Max Welling (University of Amsterdam)
Deep Learning on Graph-Structured Data Thomas Kipf
The success story of deep learning
2
Speech data
Natural language processing (NLP)
…
Deep Learning on Graph-Structured Data Thomas Kipf
The success story of deep learning
2
Speech data
Natural language processing (NLP)
…
Deep neural nets that exploit:
- translation invariance (weight sharing) - hierarchical compositionality
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
3
Euclidean data: grids, sequences…
2D grid
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
3
Euclidean data: grids, sequences…
2D grid
1D grid
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
Recurrent neural networks (RNNs)
(Source: Christopher Olah’s blog)
Deep Learning on Graph-Structured Data Thomas Kipf
Traditional vs. “deep” learning
5
Traditional approach
Hand-designed feature extractor
Classifier “on top”
Output
Deep Learning on Graph-Structured Data Thomas Kipf
Traditional vs. “deep” learning
5
Traditional approach
Hand-designed feature extractor
Classifier “on top”
Output
End-to-end learning
Deep neural network Output
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs: Message passing on a grid-graph
6
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs: Message passing on a grid-graph
6
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs: Message passing on a grid-graph
6
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs: Message passing on a grid-graph
6
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Update for a single pixel: • Transform messages individually
• Add everything up
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs: Message passing on a grid-graph
6
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Update for a single pixel: • Transform messages individually
• Add everything up
Full update:
Deep Learning on Graph-Structured Data Thomas Kipf
Graph-structured data
7
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Deep Learning on Graph-Structured Data Thomas Kipf
or this:
Graph-structured data
7
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Deep Learning on Graph-Structured Data Thomas Kipf
or this:
Graph-structured data
7
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Real-world examples:• Social networks • World-wide-web • Protein-interaction networks
• Telecommunication networks • Knowledge graphs • …
Deep Learning on Graph-Structured Data Thomas Kipf
Graph-structured data
8
Graph: Adjacency matrix: A
A
C
B
D
E
A B C D EABCDE
0 1 1 1 01 0 0 1 11 0 0 1 01 1 1 0 10 1 0 1 0
Deep Learning on Graph-Structured Data Thomas Kipf
Graph-structured data
8
• Trainable in time • Applicable even if the input graph changes
Model wish list:
Graph: Adjacency matrix: A
A
C
B
D
E
A B C D EABCDE
0 1 1 1 01 0 0 1 11 0 0 1 01 1 1 0 10 1 0 1 0
Deep Learning on Graph-Structured Data Thomas Kipf
A B C D EABCDE
0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0
Feat
A naïve approach
9
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
?A
C
B
D
E
[A,X]
Deep Learning on Graph-Structured Data Thomas Kipf
A B C D EABCDE
0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0
Feat
A naïve approach
9
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters • Re-train if graph changes
?A
C
B
D
E
[A,X]
Deep Learning on Graph-Structured Data Thomas Kipf
A B C D EABCDE
0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0
Feat
A naïve approach
9
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters • Re-train if graph changes
?A
C
B
D
E
[A,X]
Deep Learning on Graph-Structured Data Thomas Kipf
A B C D EABCDE
0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0
Feat
A naïve approach
9
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters • Re-train if graph changes
?We need weight sharing!
➡ CNNs on graphs or “Graph Convolutional Networks” (GCNs)
A
C
B
D
E
[A,X]
Deep Learning on Graph-Structured Data Thomas Kipf
GCNs with 1st-order message passing
10
Consider this undirected graph:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
GCNs with 1st-order message passing
10
Consider this undirected graph:
Calculate update for node in red:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
GCNs with 1st-order message passing
10
Consider this undirected graph:
Calculate update for node in red:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
GCNs with 1st-order message passing
10
Consider this undirected graph:
Calculate update for node in red:
(related idea was first proposed in Scarselli et al. 2009)
Update rule:
: neighbor indices
: norm. constant (per edge)
Note: We could also choose simpler or more general functions over the neighborhood
Deep Learning on Graph-Structured Data Thomas Kipf 11
GCN model architecture
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
Input: Feature matrix , preprocessed adjacency matrix
[Kipf & Welling, ICLR 2017]
Deep Learning on Graph-Structured Data Thomas Kipf 12
What does it do? An example.
f( ) =
Forward pass through untrained 3-layer GCN model
Parameters initialized randomly 2-dim output per node
[Karate Club Network]
Deep Learning on Graph-Structured Data Thomas Kipf
Relation to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Deep Learning on Graph-Structured Data Thomas Kipf
Relation to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Deep Learning on Graph-Structured Data Thomas Kipf
Relation to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
Deep Learning on Graph-Structured Data Thomas Kipf
Relation to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
Deep Learning on Graph-Structured Data Thomas Kipf
Relation to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
GCN:
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
14
Setting: Some nodes are labeled (black circle) All other nodes are unlabeled
Task: Predict node label of unlabeled nodes
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
14
Setting: Some nodes are labeled (black circle) All other nodes are unlabeled
Task: Predict node label of unlabeled nodes
Standard approach: graph-based regularization
with
assumes: connected nodes likely to share same label
[Zhu et al., 2003]
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Problem: Embeddings are not optimized for classification!
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Problem: Embeddings are not optimized for classification!
Idea: Train graph-based classifier end-to-end using GCN
Evaluate loss on labeled nodes only:
set of labeled node indices
label matrix
GCN output (after softmax)
Deep Learning on Graph-Structured Data Thomas Kipf
Toy example (semi-supervised learning)
16
Video also available here: http://tkipf.github.io/graph-convolutional-networks
Deep Learning on Graph-Structured Data Thomas Kipf
Toy example (semi-supervised learning)
16
Video also available here: http://tkipf.github.io/graph-convolutional-networks
Deep Learning on Graph-Structured Data Thomas Kipf
Application: Classification on citation networks
17
Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)
Target: Paper category (e.g. stat.ML, cs.LG, …)
(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Classification results (accuracy)
no input features
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Classification results (accuracy)
no input features
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Learned representations (t-SNE embedding of hidden layer activations)
Deep Learning on Graph-Structured Data Thomas Kipf
Other recent applications
19
[Monti et al., 2016]
Molecules
[Duvenaud et al., NIPS 2015]
Shapes
Knowledge Graphs
[Schlichtkrull et al., 2017]
Deep Learning on Graph-Structured Data Thomas Kipf
Link prediction with Graph Auto-Encoders
20
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
Encoder
Decoder
X A
Z
Â
q( Z | A, X )
p( A | Z )
GAE
Kipf & Welling, NIPS Bayesian Deep Learning Workshop, 2016
Deep Learning on Graph-Structured Data Thomas Kipf
Further reading
21
Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks
Code on Github: http://github.com/tkipf/gcn
• E-Mail: [email protected]
• Twitter: @thomaskipf
• Web: http://tkipf.github.io
Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907
Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308
You can get in touch with me via:
Project funded by SAP