Transcript

Deep Learning on Graphswith Graph Convolutional Networks

Thomas Kipf, 22 March 2017

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

joint work with Max Welling (University of Amsterdam)

BDL Workshop @ NIPS 2016 Conference paper @ ICLR 2017 (+ new paper on arXiv)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

Deep Learning on Graph-Structured Data Thomas Kipf

The success story of deep learning

2

Speech data

Natural language processing (NLP)

Deep neural nets that exploit:

- translation invariance (weight sharing) - hierarchical compositionality

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

3

Euclidean data: grids, sequences…

2D grid

1D grid

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

Deep Learning on Graph-Structured Data Thomas Kipf

Recap: Deep learning on Euclidean data

4

We know how to deal with this:

Convolutional neural networks (CNNs)

(Animation by Vincent Dumoulin) (Source: Wikipedia)

or recurrent neural networks (RNNs)

(Source: Christopher Olah’s blog)

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Update for a single pixel: • Transform neighbors individually

• Add everything up

Deep Learning on Graph-Structured Data Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Update for a single pixel: • Transform neighbors individually

• Add everything up

Full update:

Deep Learning on Graph-Structured Data Thomas Kipf

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Deep Learning on Graph-Structured Data Thomas Kipf

or this:

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Deep Learning on Graph-Structured Data Thomas Kipf

or this:

Graph-structured data

6

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

What if our data looks like this?

Real-world examples:• Social networks • World-wide-web • Protein-interaction networks

• Telecommunication networks • Knowledge graphs • …

Deep Learning on Graph-Structured Data Thomas Kipf

Graphs: Definitions

7

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)

Deep Learning on Graph-Structured Data Thomas Kipf

Graphs: Definitions

7

• Set of trainable parameters • Trainable in time • Applicable even if the input graph changes

Model wish list:

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLUGraph:

: Set of nodes ,

: Set of edges

(adjacency matrix):

We can define:

(can also be weighted)

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?

Deep Learning on Graph-Structured Data Thomas Kipf

A naive approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters

• Needs to be re-trained if number of nodes changes

• Does not generalize across graphs

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

?We need weight sharing!

=> CNNs on graphs or “Graph Convolutional Networks” (GCNs)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

Update rule:

: neighbor indices

: norm. constant (per edge)

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

CNNs on graphs with spatial filters

9

Consider this undirected graph:

Calculate update for node in red:

How is this related to spectral CNNs on graphs? ➡ Localized 1st-order approximation of spectral filters [Kipf & Welling, ICLR 2017]

Update rule:

: neighbor indices

: norm. constant (per edge)

(related idea was first proposed in Scarselli et al. 2009)

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Or treat self-connection in the same way:

with

Deep Learning on Graph-Structured Data Thomas Kipf

Vectorized form

10

with

Or treat self-connection in the same way:

with

is typically sparse➡ We can use sparse matrix multiplications! ➡ Efficient implementation in Theano or TensorFlow

Deep Learning on Graph-Structured Data Thomas Kipf 11

Model architecture (with spatial filters)

… …

Input

Hidden layer Hidden layer

ReLU

Output

ReLU

Input: Feature matrix , preprocessed adjacency matrix

[Kipf & Welling, ICLR 2017](or more sophisticated filters / basis functions)

Deep Learning on Graph-Structured Data Thomas Kipf 12

What does it do? An example.

f( ) =

Forward pass through untrained 3-layer GCN model

Parameters initialized randomly

Produces (useful?) random embeddings!

2-dim output per node

[Karate Club Network]

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

Deep Learning on Graph-Structured Data Thomas Kipf

Connection to Weisfeiler-Lehman algorithm

13

A “classical” approach for node feature assignment

Useful as graph isomorphism check for most graphs(exception: highly regular graphs)

GCN:

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

14

Setting: Some nodes are labeled (black circle) All other nodes are unlabeled

Task: Predict node label of unlabeled nodes

Standard approach: graph-based regularization (“smoothness constraints”)

with

assumes: connected nodes likely to share same label

[Zhu et al., 2003]

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Problem: Embeddings are not optimized for classification!

Deep Learning on Graph-Structured Data Thomas Kipf

Semi-supervised classification on graphs

15

Embedding-based approaches Two-step pipeline:

1) Get embedding for every node 2) Train classifier on node embedding

Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]

Problem: Embeddings are not optimized for classification!

Idea: Train graph-based classifier end-to-end using GCN

Evaluate loss on labeled nodes only:

set of labeled node indices

label matrix

GCN output (after softmax)

Deep Learning on Graph-Structured Data Thomas Kipf

Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

Deep Learning on Graph-Structured Data Thomas Kipf

Toy example (semi-supervised learning)

16

Video also available here: http://tkipf.github.io/graph-convolutional-networks

Deep Learning on Graph-Structured Data Thomas Kipf

Classification on citation networks

17

(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)

Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)

Target: Paper category (e.g. stat.ML, cs.LG, …)

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Classification results (accuracy)

no input features

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Deep Learning on Graph-Structured Data Thomas Kipf

Experiments and results

18

Classification results (accuracy)

no input features

Dataset statistics

(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)

Model: 2-layer GCN

Learned representations (t-SNE embedding of hidden layer activations)

Deep Learning on Graph-Structured Data Thomas Kipf

Other recent work

19

Spectral domain Spatial domain

SyncSpecCNN (Spectral Transformer Nets)

[Yi et al., 2016]

Chebyshev CNN (approximate spectral filters in spatial domain) [Defferrard et al., 2016]

Mixture model CNN (learned basis functions)

[Monti et al., 2016]

Global filters (hard to generalize) Local filters (limited field of view)

Deep Learning on Graph-Structured Data Thomas Kipf

Conclusions and open questions

20

• Deep learning paradigm can be extended to graphs • State-of-the-art results in a number of domains • Use end-to-end training instead of multi-stage approaches

(graph kernels, DeepWalk, node2vec etc.)

Results from past 1-2 years have shown:

Open problems:

• Learn basis functions from scratch - First step: [Monti et al., 2016] • Robustness of filters to change of graph structure • Learn representations of graphs - First step: [Duvenaud et al., 2015]

• Theoretical understanding of what is learned • Common benchmark datasets • …

Deep Learning on Graph-Structured Data Thomas Kipf

Further reading

21

Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks

Code on Github: http://github.com/tkipf/gcn

• E-Mail: [email protected]

• Twitter: @thomaskipf

• Web: http://tkipf.github.io

Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907

Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308

You can get in touch with me via:

Project funded by SAP

Deep Learning on Graph-Structured Data Thomas Kipf

How deep is deep enough?

22

Residual connection


Recommended