30
Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department, Ant Financial Associate Professor, College of Computing Georgia Institute of Technology

Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Structure2Vec: Deep Learning for

Security Analytics over Graphs

Le Song

Principal Engineer, AI Department, Ant Financial

Associate Professor, College of Computing

Georgia Institute of Technology

Page 2: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

2

Alibaba Ecosystem

Page 3: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Complex networks are everywhere

3

User

Device

GPS

Email

Account

Phone

Card

Shop

Company

Company

Shop

Device

GPS

EMAIL

Card

WIFI Phone No.

Account

Page 4: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Fake account detection

Fake account can increase system level risk

Need to shut down fake accounts

4

Account – device network

Bad

guy

Good

guy

device

New accounts in a month

millions of nodes and millions of edges.

RecallP

recision

detection comparisonRule Structure2vec

Page 5: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Cash-out detection

Cash-out for credit loans create vicious cycles

Need to detect cash-out transactions

5

Shop Owner

Account

Alipay

Account 1

Normal Bank

Account

Alipay

Account 2

Use $1000

credit

Same

person

Transfer

$800 cash

Transfer

$800 cash

Recall

Pre

cisi

on

Structure2vec

node2vec

+ xgboost

Network of millions of transactions a day

detection comparison

Page 6: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Fundamental challenge

6

(city=Atlanta) AND (age=40)

(neighbor=A) XOR (neighbor=B)

(bought=car) OR (time<3 years)

Explosive combinations!

Manual feature designNetworks

Timing

Various apps

Learn

Representation?

Page 7: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Representation Learning

via Structure2Vec

Page 8: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

𝑋6

𝑋1

𝑋2 𝑋3

𝑋4

𝑋5

Structure2Vec embedding (mean field)

𝜇2(0)

𝜇1(0)

𝜇6(0)

𝜇3(0)

𝜇5(0)

𝜇4(0)

Obtain embedding via

iterative update algorithm:

1. Initialize 𝜇𝑖(0)

= 0, ∀ 𝑖

2. Iterate 𝑇 times

𝜇𝑖(𝑡)

← 𝜎 𝑊1𝑋𝑖 + 𝑊2

𝑗∈𝒩 𝑖

𝜇𝑗(𝑡−1)

, ∀𝑖

𝜇1(1)

8

Parameterized

as neural network

[Dai, et al. 2016]

Page 9: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Structure2Vec embedding (mean field)

Obtain embedding via

iterative update algorithm:

1. Initialize 𝜇𝑖(0)

= 0, ∀ 𝑖

2. Iterate 𝑇 times

𝜇𝑖(𝑡)

← 𝜎 𝑊1𝑋𝑖 + 𝑊2

𝑗∈𝒩 𝑖

𝜇𝑗(𝑡−1)

, ∀𝑖

9

Parameterized

as neural network

𝑓 𝑓 , 𝑓

Supervised

Learning

Generative

Models

Reinforcement

Learning

𝜇𝑖(𝑇) 𝜇𝑖

(𝑇) 𝜇𝑘𝑇

𝑘∈𝑆𝜇𝑗(𝑇)

𝑋6

𝑋1

𝑋2 𝑋3

𝑋4

𝑋5

𝜇2(1)

𝜇1(1)

𝜇6(1)

𝜇3(1)

𝜇5(1)

𝜇4(1)

Page 10: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Fake account detection

Fake account can increase system level risk

Need to shut down fake accounts

10

Account – device network

Bad

guy

Good

guy

device

New accounts in a month

roughly millions of nodes and millions of edges.

RecallP

recision

detection comparisonRule Structure2vec

Page 11: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Fake account pattern

11

Fake accountNormal account

Device

Aggregation

Activity

Aggregation

Page 12: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Cross-platform code similarity detection

Key problem is to learn code graph presentation across platforms

12[Xu et al. 2017]

Page 13: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Twin-networks to extract features

Codes from the same function should be similar across platforms

13

Structure2vec

Page 14: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

New tools for representation learning over graphs

(city=Atlanta) AND (age=40)

(browser=IE) XOR (system=Linux)

(bought=car) OR (usage<3 years)

Explosive combinations!

Graphical Models

𝑋 ⊥ 𝑌 | 𝑍

Life is great

Deep learning

𝑦 = 𝑓(𝑥)

Help

Manual algorithm design

14

Page 15: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

𝑋6

𝑋1

𝑋2 𝑋3

𝑋4

𝑋5

Embed belief propagation

Approximate embedding of

𝑝 𝐻𝑖 𝑥𝑗 ↦ 𝜇𝑖

via fixed point update

1. Initialize 𝜇𝑖𝑗 , ∀ 𝑖, 𝑗

2. Iterate 𝑇 times

3. Aggregate 𝜇𝑖 = 𝑊3 ℓ∈𝒩 𝑖 𝜇ℓ𝑖(𝑇)

, ∀ 𝑖

𝜇𝑖𝑗(𝑡)

← 𝜎 𝑊1𝑋𝑖 + 𝑊2

ℓ∈𝒩 𝑖 \j

𝜇ℓ𝑖(𝑡−1)

, ∀(𝑖, 𝑗)

𝑓 𝑓 , 𝑓

Supervised

Learning

Generative

Models

Reinforcement

Learning 15

Parameterized

as neural network

Page 16: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Dynamic Networks

Page 17: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

who will do

what and when?

Dynamic processes over networks

ChristineAliceDavid Jacob

BookTowelShoe

𝑅user

item

matrix factorization

𝑈

𝑉

17

Page 18: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Unroll: time-varying dependency structuretime

𝑡0

𝑡2

𝑡1

𝑡3

Represent

𝑋1 𝑋2 𝑋3

𝑋4

𝑋5

𝐻8 𝐻9

𝐻6 𝐻7

𝐻4

𝐻1

𝐻5

𝐻2 𝐻3

𝑋6

LVM

𝐺 = (𝒱, ℇ)

user/item

raw features

Interaction

time/context

18[Dai, et al. 2016]

Page 19: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Embed filtering/forward message passingtime

𝑡0

𝑡2

𝑡1

𝑡3

Represent

𝑋1 𝑋2 𝑋3

𝑋4

𝑋5

𝐻8 𝐻9

𝐻6 𝐻7

𝐻4

𝐻1

𝐻5

𝐻2 𝐻3

𝑋6

LVM

𝐺 = (𝒱, ℇ)

user/item

raw features

Interaction

time/context

19

Page 20: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Cash-out detection

Cash-out for credit loans create vicious cycles

Need to detect cash-out transactions

20

Shop Owner

Account

Alipay

Account 1

Normal Bank

Account

Alipay

Account 2

Use $1000

credit

Same

person

Transfer

$800 cash

Transfer

$800 cash

Recall

Pre

cisi

on

Structure2vec

node2vec

+ xgboost

Network of millions of transactions a day

detection comparison

Page 21: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

GDELT database

Events in news media

subject – relation – object

and time

Total archives span >215 years,

trillion of events

Time-varying dependency structure

Temporal knowledge graph:

What will happen next?

21[Trivedi, et al. 2017]

Page 22: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Enemy’s

friend

is an

enemy

CAIRO CROATIA

MANCHESTER

PROTESTOR

NEW ZEALAND

SOMALIA

TEHRAN

SINGAPORE

<ASSAULT : 06-Jul-2015>

CO

NS

ULT

CO

OP

ER

ATE

<0

6-Ju

n-2

01

5>

<2

9-Ju

n-2

01

5>

(predicted event)

<2

8-M

ay-2

01

5>

FIGH

TFIG

HT

<0

5-M

ay-2

01

5>

22

Page 23: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

COLOMBIA OTTAWA

NEW DELHI

BELGIUM

LIBYA

VENEZUELA

UAE

GAUTEMALA

<MATERIAL COOP. : 02-Jul-2015>

CO

NS

ULT

PR

OV

IDE

AID

<2

6-Ju

n-2

01

5>

<0

3-Feb

-20

15

>

(predicted event)

<1

6-M

ay-2

01

5>

FIGH

TTR

AD

E C

OO

P.

<2

8-M

ar-2

01

5>

Friends’

friend

is a friend,

common

enemy

strengthen

the tie

23

Page 24: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Reinforcement Learning

Page 25: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Combinatorial optimization over graphs

NP-hard

problems

Application Optimization problem

Fraudster: virus spreading

Analysts: community discovery

Platforms: resource scheduling

Minimum vertex/set cover

Maximum cut

Traveling salesman

25

Page 26: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

0

Greedy algorithm for minimum vertex cover

2 - approximation for minimum vertex cover

Repeat till all edges covered:

• Select uncovered edge with largest total degree

Manually

designed rule.

Can we learn

from data?

1

1

00

0

0 0

0 0

0

NP-hard problems

26

Page 27: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Greedy algorithm as Markov decision process

min𝑥𝑖∈ 0,1

𝑖∈𝓥

𝑥𝑖

𝑠. 𝑡. 𝑥𝑖 + 𝑥𝑗 ≥ 1, ∀ 𝑖, 𝑗 ∈ 𝓔

Repeat:

1. Compute total degree of each

uncovered edge

2. Select both ends of uncovered

edge with largest total degree

Until all edges are covered

Minimum vertex cover: smallest number of nodes to cover all edges

Reward: 𝑟𝑡 = −1

State 𝑆: current selected nodes

Action value function: 𝑄(𝑆, 𝑖)

Greedy policy:

𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)

Update state 𝑆

27

Page 28: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Embedding for state-action value function

pick best

node

Greedy action

𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)

State-action value function 𝑄 𝑆, 𝑖

= 𝜃1𝜎(𝜃2 𝑗∈𝑉 𝜇𝑗 + 𝜃3 𝜇𝑖)

aggregated

embedding

individual

embedding

𝑓 ,

Reinforcement

Learning

1. Problem graph

2.

Model &

Structure2Vec

3. Q-function4. Train

28[Dai et al. 2017]

Page 29: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

What algorithm is learned?

Learned algorithm balances between

• degree of the picked node and

• fragmentation of the graph

Structure2Vec Node greedy Edge greedy

29

Page 30: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,

Summary

30

Networks

Timing

Various apps

Structure2Vec

Representation

CNN and RNN are for

images and sequences

Supervised

Learning

Generative

Models

Reinforcement

Learning

We are hiring!