39
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1 CSE 6240: Web Search and Text Mining. Spring 2020 Graph Neural Networks Prof. Srijan Kumar http://cc.gatech.edu/~srijan

Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1

CSE 6240: Web Search and Text Mining. Spring 2020

Graph Neural Networks

Prof. Srijan Kumarhttp://cc.gatech.edu/~srijan

Page 2: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 2

Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE

Page 3: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 3

Goal: Node Embeddings

similarity(u, v) ⇡ z>v zuGoal:

Need to define!

Input network d-dimensional embedding space

Page 4: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 4

Deep Graph Encoders• Encoder: Map a node to a low-dimensional

vector:• Deep encoder methods based on graph

neural networks:

• Graph encoders idea isinspired by CNN on images

enc(v) = zv

enc(v) =multiple layers of

non-linear transformations of graph structure

End-to-end learning on graphs with GCNs Thomas Kipf

Convolutional neural networks (on grids)

5

(Animation by Vincent Dumoulin)

Single CNN layer with 3x3 filter:

Image Graph

Page 5: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 5

Idea from Convolutional Networks• In CNN, pixel representation is created by

transforming neighboring pixel representation

– In GNN, node representations are created by transforming neighboring node representation

• But graphs are irregular, unlike images– So, generalize convolutions beyond simple

lattices, and leverage node features/attributes• Solution: deep graph encoders

Page 6: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 6

Deep Graph Encoders

Output: Node embeddings, embed larger network structures, subgraphs, graphs

• Once an encoder is defined, multiple layers of encoders can be stacked

Page 7: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 7

Graph Encoder: A Naïve Approach

• Join adjacency matrix and features• Feed them into a deep neural network:

• Issues with this idea:– 𝑂(|𝑉|) parameters– Not applicable to graphs of different sizes– Not invariant to node ordering

End-to-end learning on graphs with GCNs Thomas Kipf

A B C D EABCDE

0 1 1 1 0 1 01 0 0 1 1 0 01 0 0 1 0 0 11 1 1 0 1 1 10 1 0 1 0 1 0

Feat

A naïve approach

8

• Take adjacency matrix and feature matrix

• Concatenate them

• Feed them into deep (fully connected) neural net

• Done?

Problems:

• Huge number of parameters • No inductive learning possible

?A

C

B

D

E

[A,X]

Page 8: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 8

Graph Encoders: Two Instantiations1. Graph convolution networks (GCN): one

of the first frameworks to learn node embeddings in an end-to-end manner

– Different from random walk methods, which are not end-to-end

2. GraphSAGE: generalized GCNs to various neighborhood aggregations

Page 9: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 9

Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks (GCN)• GraphSAGE

Main paper: “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf and Welling, ICLR 2017

Page 10: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 10

Content• Local network neighborhoods:

– Describe aggregation strategies– Define computation graphs

• Stacking multiple layers:– Describe the model, parameters, training– How to fit the model?– Simple example for unsupervised and

supervised training

Page 11: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 11

Setup• Assume we have a graph 𝐺:

– 𝑉 is the vertex set– 𝑨 is the adjacency matrix (assume binary)– 𝑿 ∈ ℝ+×|-| is a matrix of node features

– Social networks: User profile, User image– Biological networks: Gene expression profiles– If there are no features, use:

» Indicator vectors (one-hot encoding of a node)» Vector of constant 1: [1, 1, …, 1]

Page 12: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 12

Graph Convolutional Networks• Idea: Generate node embeddings based on

local network neighborhoods – A node’s neighborhood defines its computation

graph• Learn how to aggregate information from

the neighborhood to learn node embeddings

– Transform information from the neighbors and combine it:

• Transform “messages” ℎ/ from neighbors: 𝑊/ℎ/

Page 13: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 13

Idea: Aggregate Neighbors• Intuition: Generate node embeddings based

on local network neighborhoods • Nodes aggregate information from their

neighbors using neural networks

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Neuralnetworks

Page 14: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 14

Idea: Aggregate Neighbors• Intuition: Network neighborhood defines a

computation graph

Everynodedefinesacomputationgraphbasedonitsneighborhood

Page 15: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 15

Deep Model: Many Layers• Model can be of arbitrary depth:

– Nodes have embeddings at each layer– Layer-0 embedding of node 𝒖 is its input feature, 𝒙𝒖– Layer-K embedding gets information from nodes

that are atmost K hops away

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

xA

xB

xC

xExF

xA

xA

Layer-2

Layer-1Layer-0

Page 16: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 16

Neighborhood Aggregation • Neighborhood aggregation: Key

distinctions are in how different approaches aggregate information across the layers

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

?

?

?

?

What is in the box?

Page 17: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 17

Neighborhood Aggregation• Basic approach: Average information from

neighbors and apply a neural network

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

(1)averagemessagesfromneighbors

(2)applyneuralnetwork

Page 18: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 18

The Math: Deep Encoder• Basic approach: Average neighbor

messages and apply a neural network– Note: Apply L2 normalization for each node embedding at

every layer

Averageofneighbor’spreviouslayerembeddings

Initial0-thlayerembeddings areequaltonodefeatures

EmbeddingafterKlayersofneighborhoodaggregation

Non-linearity(e.g.,ReLU)

Previouslayerembeddingofvh

0v = xv

h

kv = �

0

@Wk

X

u2N(v)

h

k�1u

|N(v)| +Bkhk�1v

1

A , 8k 2 {1, ...,K}

zv = h

Kv

Page 19: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 19

GCN: Matrix Form• H(l) is the representation in lth layer• W0

(l) and W1(l) are matrices to be learned for

each layer• A = adjacency matrix, D = diagonal degree

matrix• GCN rewritten in vector form:

Page 20: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 20

𝒛5

Training the Model• How do we train the model?

– Need to define a loss function on the embeddings

Page 21: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 21

Model Parameters

• We can feed these embeddings into any loss function and run stochastic gradient descent to train the weight parameters

– Once we have the weight matrices, we can calculate the node embeddings

Trainableweightmatrices(i.e.,whatwelearn)h

0v = xv

h

kv = �

0

@Wk

X

u2N(v)

h

k�1u

|N(v)| +Bkhk�1v

1

A , 8k 2 {1, ...,K}

zv = h

Kv

Page 22: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 22

Unsupervised Training• Training can be unsupervised or supervised• Unsupervised training:

– Use only the graph structure: “Similar” nodes have similar embeddings

– Common unsupervised loss function = edge existence

• Unsupervised loss function can be anything from the last section, e.g., a loss based on– Node proximity in the graph– Random walks

Page 23: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 23

Supervised Training• Train the model for a

supervised task (e.g., node classification)

• Two ways:– Total loss =

supervised loss – Total loss =

supervised loss + unsupervised loss

E.g., Normal or anomalous node?

Page 24: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 24

Model Design: Overview(1) Define a neighborhood

aggregation function

(2) Define a loss function on the embeddings

𝒛5

Page 25: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 25

Model Design: Overview

(3) Train on a set of nodes, i.e., a batch of compute graphs

Page 26: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 26

Model Design: Overview

(4) Generate embeddings for nodes as needed

Even for nodes we never trained on!

Page 27: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 27

GCN: Inductive Capability• The same aggregation parameters are

shared for all nodes:– The number of model parameters is sublinear in |𝑉| and we can generalize to unseen nodes

INPUT GRAPH

B

DE

F

CA

Compute graph for node A Compute graph for node B

shared parameters

shared parametersWk Bk

Page 28: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 28

Inductive Capability: New Nodes

Trainwithsnapshot NewnodearrivesGenerateembedding

fornewnode

zu

• Many application settings constantly encounter previously unseen nodes:

• E.g., Reddit, YouTube, Google Scholar• Need to generate new embeddings “on the fly”

Page 29: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 29

Inductive Capability: New Graphs

Inductive node embedding Generalize to entirely unseen graphs

E.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B

Train on one graph Generalize to new graph

zu

Page 30: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 30

Summary So Far• Recap: Generate node embeddings by

aggregating neighborhood information– We saw a basic variant of this idea– Key distinctions are in how different approaches

aggregate information across the layers

• Next: GraphSAGE graph neural network architecture

Page 31: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 31

Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE

• Main paper: Inductive Representation Learning on Large Graphs. William L. Hamilton, Rex Ying, Jure Leskovec. NeurIPS 2017.

Page 32: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 32

GraphSAGE Idea• In GCN, we aggregated the neighbors’

messages as the (weighted) average of all neighbors. How can we generalize this?

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

?

?

?

?

[Hamilton et al., NIPS 2017]

Page 33: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 33

GraphSAGE Idea

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

hkv = �

�⇥Ak · agg({hk�1

u , 8u 2 N(v)}),Bkhk�1v

⇤�

Any differentiable function that maps set of vectors in 𝑁(𝑢) to a single vector

Page 34: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 34

Neighborhood Aggregation• Simple neighborhood aggregation:

• GraphSAGE:Concatenate neighbor embedding

and self embedding

hkv = �

�⇥Wk · agg

�{hk�1

u , 8u 2 N(v)}�,Bkh

k�1v

⇤�

hkv = �

0

@Wk

X

u2N(v)

hk�1u

|N(v)| +Bkhk�1v

1

A

Generalized aggregation

Page 35: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 35

Neighbor Aggregation: Variants• Mean: Take a weighted average of

neighbors

• Pool: Transform neighbor vectors and apply symmetric vector function

• LSTM: Apply LSTM to reshuffled of neighbors

agg =X

u2N(v)

hk�1u

|N(v)|

Element-wisemean/max

agg = ��{Qhk�1

u , 8u 2 N(v)}�

agg = LSTM�[hk�1

u , 8u 2 ⇡(N(v))]�

Page 36: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 36

Experiments: Dataset• Dynamic datasets:

– Citation Network: Predict paper category• Data from 2000-2005• 302,424 nodes• Train: data till 2004, test: 2005 data

– Reddit Post Network: Predict subreddit of post• Nodes = posts• Edges between posts if common users comment on

the post• 232,965 posts• Train: 20 days of data, test: next 10 days of data

Page 37: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 37

Experiments: Results

Page 38: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 38

Summary: GCN and GraphSAGE• Key idea: Generate node embeddings based

on local neighborhoods – Nodes aggregate “messages” from their neighbors

using neural networks• Graph convolutional networks:

– Basic variant: Average neighborhood information and stack neural networks

• GraphSAGE:– Generalized neighborhood aggregation

Page 39: Graph Neural Networkssrijan/teaching/cse6240/spring2020/slides/1… · neural networks: •Graph encoders idea is inspired by CNN on images enc(v)=z v enc(v)= multiple layers of non-linear

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 39

Today’s Lecture• Introduction to deep graph embeddings• Graph convolution networks• GraphSAGE