Deep Learning on Graphs - Thomas Kipftkipf.github.io/misc/GCNSlides.pdf · Deep Learning on Graph-Structured Data Thomas Kipf Semi-supervised classification on graphs 15 Embedding-based

Deep Learning on Graphswith Graph Convolutional Networks

Thomas Kipf, 6 April 2017

… …

…

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

joint work with Max Welling (University of Amsterdam)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

…


The success story of deep learning

2

Speech data

Natural language processing (NLP)

…

Deep neural nets that exploit:

- translation invariance (weight sharing) - hierarchical compositionality


Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid



3

Euclidean data: grids, sequences…

2D grid

1D grid



4

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)



4





4



Recurrent neural networks (RNNs)

(Source: Christopher Olah’s blog)


Traditional vs. “deep” learning

5

Traditional approach

Hand-designed feature extractor

Classifier “on top”

Output


Traditional vs. “deep” learning

5

Traditional approach

Hand-designed feature extractor

Classifier “on top”

Output

End-to-end learning

Deep neural network Output


CNNs: Message passing on a grid-graph

6

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:



6





6



…



6



…

Update for a single pixel: • Transform messages individually

• Add everything up



6



…

Update for a single pixel: • Transform messages individually

• Add everything up

Full update:


Graph-structured data

7

… …

…

Input


ReLU

Output

ReLU

What if our data looks like this?


or this:


7

… …

…

Input


ReLU

Output

ReLU



or this:


7

… …

…

Input


ReLU

Output

ReLU


Real-world examples:• Social networks • World-wide-web • Protein-interaction networks

• Telecommunication networks • Knowledge graphs • …



8

Graph: Adjacency matrix: A

A

C

B

D

E

A B C D EABCDE

0 1 1 1 01 0 0 1 11 0 0 1 01 1 1 0 10 1 0 1 0



8

• Trainable in time • Applicable even if the input graph changes

Model wish list:

Graph: Adjacency matrix: A

A

C

B

D

E

A B C D EABCDE

0 1 1 1 01 0 0 1 11 0 0 1 01 1 1 0 10 1 0 1 0


A B C D EABCDE

0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0

Feat

A naïve approach

9

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

?A

C

B

D

E

[A,X]


A B C D EABCDE

0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0

Feat

A naïve approach

9




• Done?

Problems:

• Huge number of parameters • Re-train if graph changes

?A

C

B

D

E

[A,X]


A B C D EABCDE

0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0

Feat

A naïve approach

9




• Done?

Problems:


?A

C

B

D

E

[A,X]


A B C D EABCDE

0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0

Feat

A naïve approach

9




• Done?

Problems:


?We need weight sharing!

➡ CNNs on graphs or “Graph Convolutional Networks” (GCNs)

A

C

B

D

E

[A,X]


GCNs with 1st-order message passing

10

Consider this undirected graph:

(related idea was first proposed in Scarselli et al. 2009)



10


Calculate update for node in red:




10






10




Update rule:

: neighbor indices

: norm. constant (per edge)

Note: We could also choose simpler or more general functions over the neighborhood

Deep Learning on Graph-Structured Data Thomas Kipf 11

GCN model architecture

… …

…

Input


ReLU

Output

ReLU

Input: Feature matrix , preprocessed adjacency matrix

[Kipf & Welling, ICLR 2017]

Deep Learning on Graph-Structured Data Thomas Kipf 12

What does it do? An example.

f( ) =

Forward pass through untrained 3-layer GCN model

Parameters initialized randomly 2-dim output per node

[Karate Club Network]


Relation to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment



13




13


Useful as graph isomorphism check for most graphs(exception: highly regular graphs)



13





13



GCN:


Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes



14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Standard approach: graph-based regularization

with

assumes: connected nodes likely to share same label

[Zhu et al., 2003]



15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]



15




Problem: Embeddings are not optimized for classification!



15




Problem: Embeddings are not optimized for classification!

Idea: Train graph-based classifier end-to-end using GCN

Evaluate loss on labeled nodes only:

set of labeled node indices

label matrix

GCN output (after softmax)


Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

http://tkipf.github.io/graph-convolutional-networks


Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks



Application: Classification on citation networks

17

Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)

Target: Paper category (e.g. stat.ML, cs.LG, …)

(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)


Experiments and results

18

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN



18

Classification results (accuracy)

no input features

Dataset statistics


Model: 2-layer GCN



18

Classification results (accuracy)

no input features

Dataset statistics


Model: 2-layer GCN

Learned representations (t-SNE embedding of hidden layer activations)


Other recent applications

19

[Monti et al., 2016]

Molecules

[Duvenaud et al., NIPS 2015]

Shapes

Knowledge Graphs

[Schlichtkrull et al., 2017]


Link prediction with Graph Auto-Encoders

20

… …

…

Input


ReLU

Output

ReLU

Encoder

Decoder

X A

Z

Â

q( Z | A, X )

p( A | Z )

GAE

Kipf & Welling, NIPS Bayesian Deep Learning Workshop, 2016


Further reading

21

Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks

Code on Github: http://github.com/tkipf/gcn

• E-Mail: [email protected]

• Twitter: @thomaskipf

• Web: http://tkipf.github.io

Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907

Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308

You can get in touch with me via:

Project funded by SAP


http://github.com/tkipf/gcn

https://arxiv.org/abs/1609.02907

https://arxiv.org/abs/1611.07308

Documents

Deep Learning on Graphs - Thomas Kipftkipf.github.io/misc/GCNSlides.pdf · Deep Learning on Graph-Structured Data Thomas Kipf Semi-supervised classification on graphs 15 Embedding-based