Generalizing Convolutional Neural Networks to Graph ...dzeng/BIOS740/Walker_Bios740.pdf ·...

Preview:

Citation preview

Generalizing Convolutional Neural Networks to Graph-

structured DataBen Walker

Department of Mathematics, UNC-Chapel Hill5/4/2018

Overview• Relational structure in data and how to approach it

• Defferrard, Bresson, Vandergheynst 2016: Fast spectral filter method

• Kipf, Welling 2017: A first-order simplification for improved performance

• Discussion

Unstructured Data

Unstructured DataName Alice Bob

Age 14 65

Gender F M

Smokes? N Y

Gender M F

Smokes? Y N

Age 65 14

Name Bob Alice

Unstructured Data

• The order is irrelevant to processing - there is no prescribed relationship between the variables

Name Alice Bob

Age 14 65

Gender F M

Smokes? N Y

Gender M F

Smokes? Y N

Age 65 14

Name Bob Alice

Unstructured Data

• The order is irrelevant to processing - there is no prescribed relationship between the variables

• Use a fully-connected network to learn the relationships

Name Alice Bob

Age 14 65

Gender F M

Smokes? N Y

Gender M F

Smokes? Y N

Age 65 14

Name Bob Alice

Grid-structured Data

Grid-structured DataA kitten

Grid-structured DataA kitten Google Vision Results

Grid-structured DataA kitten Google Vision Results Same Kitten, Different Order

Grid-structured Data

• Reordered kitten picture is unintelligible

• Use a convolutional neural network to reduce parameters

A kitten Google Vision Results Same Kitten, Different Order

Graph-structured Data

• There is some relationship between data, which is given on an input-specific basis, not known a priori

• What can you use here?

Graph Convolutional Network, (Kipf and Welling 2017)

Defferrard et al 2016

Defferrard et al 2016• Spectral method allows for robust application to the

“neighborhood” of a node.

Defferrard et al 2016• Spectral method allows for robust application to the

“neighborhood” of a node.

L = D �W

Defferrard et al 2016• Spectral method allows for robust application to the

“neighborhood” of a node.

L = D �W y =K�1X

k=0

✓kLkx

Defferrard et al 2016• Spectral method allows for robust application to the

“neighborhood” of a node.

y =K�1X

k=0

✓kTk(L)xL =2

�max

L� In

L = D �W y =K�1X

k=0

✓kLkx

Defferrard et al 2016• Spectral method allows for robust application to the

“neighborhood” of a node.

• This “filtering” that maps x to y is the equivalent of the convolution step in a standard convolutional network - K parameters to learn.

y =K�1X

k=0

✓kTk(L)xL =2

�max

L� In

L = D �W y =K�1X

k=0

✓kLkx

Defferrard et al 2016y =

K�1X

k=0

✓kTk(L)x

Defferrard et al 2016

• Localized - kth term in sum includes contribution up to k hops from the node

y =K�1X

k=0

✓kTk(L)x

Defferrard et al 2016

• Localized - kth term in sum includes contribution up to k hops from the node

• Recursive definition, allowing for efficient computation

y =K�1X

k=0

✓kTk(L)x

Tk+1(L)x = 2LTk(L)x� Tk�1(L)x

Defferrard et al 2016

• Localized - kth term in sum includes contribution up to k hops from the node

• Recursive definition, allowing for efficient computation

• This filter is something we can apply machine learning techniques to

y =K�1X

k=0

✓kTk(L)x

Tk+1(L)x = 2LTk(L)x� Tk�1(L)x

Validation• Chebyshev filter Graph CNN tested on MNIST

• Graph created to represent grid structure

• Comparable performance to classical CNN

• Also validated on 20NEWS text categorization dataset.

Kipf, Welling 2017

Kipf, Welling 2017• Aim to improve the approach from Defferrard

• Linearize the previous filter equation

y = ✓

00x� ✓

01D

� 12AD

� 12x

Kipf, Welling 2017• Aim to improve the approach from Defferrard

• Linearize the previous filter equation

• Simplify and renormalize for improved numerical stability, and generalize to multiple feature maps to get an equation

y = ✓

00x� ✓

01D

� 12AD

� 12x

Z = D� 12 AD� 1

2X⇥

Kipf, Welling 2017• Aim to improve the approach from Defferrard

• Linearize the previous filter equation

• Simplify and renormalize for improved numerical stability, and generalize to multiple feature maps to get an equation

y = ✓

00x� ✓

01D

� 12AD

� 12x

Z = D� 12 AD� 1

2X⇥

Xk+1 = � (MXk⇥k)

Validation• Validation Datasets

• Citeseer, Cora, and Pubmed citation networks

• NELL knowledge graph

Comparison of classification accuracy percentage of different methods. (Kipf Welling 2017)

Discussion

Discussion• Graph-structured data is an interesting new frontier for

machine-learning methods

Discussion• Graph-structured data is an interesting new frontier for

machine-learning methods

• Kipf and Welling GCN is very similar to standard neural network formulations

Xk+1 = � (MXk⇥k)

Discussion• Graph-structured data is an interesting new frontier for

machine-learning methods

• Kipf and Welling GCN is very similar to standard neural network formulations

• By nature of linearization, it is localized at a distance of 1.

Xk+1 = � (MXk⇥k)

References

Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. "Convolutional neural networks on graphs with fast localized spectral filtering." Advances in Neural Information Processing Systems. 2016.

Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).

Recommended