66
Graph mining for log data 1 David Andrzejewski @davidandrzej Data Sciences, Sumo Logic Strata – Hardcore Data Science Track February 18, 2015

Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Graph mining for log data  

1  

David  Andrzejewski  -­‐  @davidandrzej  Data  Sciences,  Sumo  Logic  Strata  –  Hardcore  Data  Science  Track  February  18,  2015      

Page 2: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

This  talk:  Graph  Mining  +  Log  Data  

2  

  logs    graph  mining    applicaMon  examples  

 

YES  

Page 3: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

This  talk:  Graph  Mining  +  Log  Data  

3  

  logs    graph  mining    applicaMon  examples  

   tools    scaling  

YES  

NO  

Page 4: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes  

Graphs!  

4  

Page 5: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected    

Graphs!  

5  

Page 6: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

Graphs!  

6  

Page 7: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

  Components  

Graphs!  

7  

Page 8: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

  Components    Paths/reachability  

Graph  data  

8  

Page 9: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

  Components    Paths/reachability    Subgraphs  

Graphs!  

9  

Page 10: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

  Components    Paths/reachability    Subgraphs    Degree  

Graphs!  

10  

1  

3  

2  

2  

Page 11: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Nodes    Edges  

–  undirected  –  directed  

  Components    Paths/reachability    Subgraphs    Degree    Labels  

Graphs!  

11  

Page 12: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

12  

Graph   Nodes   Edges  Social   People   Friendship  

Page 13: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

13  

Documents

Politics

Documents

Politics

Documents

Politics

Documents

Politics

Graph   Nodes   Edges  Social   People   Friendship  Web   Pages   Links  

Page 14: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

14  

API   Auth  

User  

Org  

Graph   Nodes   Edges  Social   People   Friendship  Web   Pages   Links  

System   Services   API  Calls  

Page 15: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Anatomy  of  a  log  message:  Five  W’s  

15  

Page 16: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Anatomy  of  a  log  message:  Five  W’s  

16  

  When?  Timestamp  with  Mme  zone  

Page 17: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Anatomy  of  a  log  message:  Five  W’s  

17  

  When?  Timestamp  with  Mme  zone    Where?  Host,  module,  code  locaMon  

Page 18: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Anatomy  of  a  log  message:  Five  W’s  

18  

  When?  Timestamp  with  Mme  zone    Where?  Host,  module,  code  locaMon    Who?  AuthenMcaMon  context  

Page 19: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Anatomy  of  a  log  message:  Five  W’s  

19  

  When?  Timestamp  with  Mme  zone    Where?  Host,  module,  code  locaMon    Who?  AuthenMcaMon  context    What?  Log  level  and  key-­‐value  pairs  

Page 20: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Context:  Sumo  Logic  

20  

“Turning Machine Data Into IT and Business Insights”  

Page 21: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

InteracMons  /  connecMons  in  log  data  

21  

  Human  –  Machine  –  behavior  analysis  

•  business  intelligence  •  security  

 

Page 22: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

InteracMons  /  connecMons  in  log  data  

  Human  –  Machine  –  behavior  analysis  

•  business  intelligence  •  security  

   Machine  –  Machine  

–  API  calls    •  ops  /  troubleshooMng  

Page 23: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

InteracMons  /  connecMons  in  log  data  

23  

  Human  –  Machine  –  behavior  analysis  

•  business  intelligence  •  security  

   Machine  –  Machine  

–  API  calls    •  ops  /  troubleshooMng  

  Human  –  Human  –  not  usually  logged...yet  

Page 24: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …    …  orderID=34C8  received  for  requestID=082A  …  Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  ERROR  accountID=1234  not  found!    PROCESSING  FAILED:  webID=79F92  

Use  case:  troubleshooMng  

Page 25: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92   Use  case:  troubleshooMng  

Page 26: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …  

Use  case:  troubleshooMng  

Page 27: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …    …  orderID=34C8  received  for  requestID=082A  …  

Use  case:  troubleshooMng  

Page 28: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …    …  orderID=34C8  received  for  requestID=082A  …  Retrieving  userID=11D2  for  requestID=082A  …  

Use  case:  troubleshooMng  

Page 29: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …    …  orderID=34C8  received  for  requestID=082A  …  Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  

Use  case:  troubleshooMng  

Page 30: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

User  action  webID=7F92  Initiating  requestID=082A  for  webID=7F92  …    …  orderID=34C8  received  for  requestID=082A  …  Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  ERROR  accountID=1234  not  found!    PROCESSING  FAILED:  webID=79F92  

Use  case:  troubleshooMng  

Page 31: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Connected  components  

  Parse  fields                                          from  each  log  event      

ℓi1, ℓi2, ... ℓi

Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  

Page 32: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Connected  components  

  Parse  fields                                          from  each  log  event      Build  graph  

–  nodes  =  each  log  event  –  edges  =  do  a  pair  of  logs  match  on  any  field?    

 

ℓi1, ℓi2, ... ℓi

ℓi

eij =!

k{ℓik = ℓ

jk}

Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  

Page 33: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Connected  components  

  Parse  fields                                          from  each  log  event      Build  graph  

–  nodes  =  each  log  event  –  edges  =  do  a  pair  of  logs  match  on  any  field?    

  Calculate  undirected  connected  components    Output:  parMMon  over        

ℓi1, ℓi2, ... ℓi

ℓi

eij =!

k{ℓik = ℓ

jk}

ℓiO(n)

Retrieving  userID=11D2  for  requestID=082A  …  …  accountID=1234  access,  userID=11D2  …  

Page 34: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Distributed  systems  tracing  infrastructure  

  Dapper  (Google)  Zipkin  (Twiher)    X-­‐Trace  (UC-­‐Berkeley)  inCapacity  (LinkedIn)    Erlang  /  Akka    Commercial  products  

Page 35: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  User  interacMons    –  state  transiMon  graph  –  internal  call  cascades  

Login  

Browse  

Check  out  

Add  to  cart  

Page 36: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  User  interacMons    –  state  transiMon  graph  –  internal  call  cascades  

  Goals:  idenMfy  unusual...  –  ...  user  behavior  –  ...  service  behavior  

Login  

Browse  

Check  out  

Add  to  cart  

$ € ¥

Page 37: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  idenMfy  visits  (eg,  connected  components)  

Visit  

37CF  

5450  

A84B  

...    

FF71  

Page 38: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  idenMfy  visits  (eg,  connected  components)    “featurize”  

Visit   Login   Browse   Cart   Checkout  

37CF   1   7   1   0  

5450   0   3   2   1  

A84B   2   1   1347   0  

...    

...    

...    

FF71   2   13   2   0  

Page 39: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  idenMfy  visits  (eg,  connected  components)    “featurize”    staMsMcal  modeling  /  machine  learning  

Visit   Login   Browse   Cart   Checkout  

37CF   1   7   1   0  

5450   0   3   2   1  

A84B   2   1   1347   0  

...    

...    

...    

...   ...  

FF71   2   13   2   0  

Page 40: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  idenMfy  visits  (eg,  connected  components)    “featurize”      staMsMcal  modeling  /  machine  learning  

Visit   Login   Browse   Cart   Checkout  

37CF   1   7   1   0  

5450   0   3   2   1  

A84B   2   1   1347   0  

...    

...    

...    

...   ...  

FF71   2   13   2   0  

Page 41: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  online  shopping    

  AlternaMve  featurizaMon  –  previous:  “node-­‐wise”  –  alternaMve  “edge-­‐wise”  

Visit   Login  >  Browse  

Browse  >  Cart  

Cart  >  Browse  

Browse  >  Checkout  

Login  >  Checkout  

...  

37CF   1   7   1   0   0   ...  

5450   1   3   2   1   0   ...  

A84B   0   0   0   0   799   ...  

...   ...   ...   ...   ...   ...   ...  

FF71   1   13   2   0   ...   ...  

Page 42: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

ML  /  stats  detour:  fixed-­‐length  feature  vectors  Fischer  Iris  dataset  (1936)  

Sepal  length  

Sepal  width  

Petal  length  

Petal  width  

Species  

5.0   3.5   1.6   0.6   I.  setosa  

5.9   3.2   4.8   1.8   I.  versicolor  

6.1   2.6   5.6   1.4   I.  virginica  

...   ...   ...   ...   ...  Photo:  Danielle  Langlois  

Page 43: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

ML  /  stats  detour:  fixed-­‐length  feature  vectors  

Sepal  length  

Sepal  width  

Petal  length  

Petal  width  

Species  

5.0   3.5   1.6   0.6   I.  setosa  

5.9   3.2   4.8   1.8   I.  versicolor  

6.1   2.6   5.6   1.4   I.  virginica  

...   ...   ...   ...   ...  Photo:  Danielle  Langlois  

Fischer  Iris  dataset  (1936)  

Page 44: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Always  Be  Featurizing  

Node   •  properMes  •  connecMvity  •  neighbors    

•  compromised  machine  

Target  enLty   Features   ApplicaLons  

Page 45: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Always  Be  Featurizing  

Node   •  properMes  •  connecMvity  •  neighbors    

•  compromised  machine  

Edge   •  properMes  •  nodes    •  node  features  

•  high  latency  •  rare  connect  

Target  enLty   Features   ApplicaLons  

Page 46: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Always  Be  Featurizing  

Node   •  properMes  •  connecMvity  •  neighbors    

•  compromised  machine  

Edge   •  properMes  •  nodes    •  node  features  

•  high  latency  •  rare  connect  

Graph   •  nodes  /  edges  •  connecMvity  •  subgraph    

•  failed  session  •  misbehavior  

Target  enLty   Features   ApplicaLons  

Page 47: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  unusual  remote  access  detecMon      Remote  access  (eg,  SSH)  graphs    Are  our  observaMons  “typical”?    

Page 48: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  unusual  remote  access  detecMon      Remote  access  (eg,  SSH)  graphs    Are  our  observaMons  “typical”?  

– machine-­‐edge:  connect  from  host  X  to  host  Y?  

 

Page 49: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  unusual  remote  access  detecMon      Remote  access  (eg,  SSH)  graphs    Are  our  observaMons  “typical”?  

– machine-­‐edge:  connect  from  host  X  to  host  Y?  –  graph:  maximum  depth  /  path  length?  

Page 50: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  unusual  remote  access  detecMon      Remote  access  (eg,  SSH)  graphs    Are  our  observaMons  “typical”?  

– machine-­‐edge:  connect  from  host  X  to  host  Y?  –  graph:  maximum  depth  /  path  length?  –  user-­‐edge:  that  user  A  connects  to  host  X?  

Page 51: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  GOAL:  understand  usage  of  (expensive!)  internal  service    –  each  observaMon  is  an  invoking  call  graph  

  How  are  different  invocaMons...  –  ...the  same?  –  ....different?  

Use  case:  understanding  internal  API  calls  

51  

Page 52: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  given  a  collecMon  of  graphs    return  sub-­‐graphs  which  occur  in                              graphs      

 

Frequent  substructure  mining  

52  

≥ T

Page 53: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  given  a  collecMon  of  graphs    return  sub-­‐graphs  which  occur  in                              graphs      

Frequent  substructure  mining  

53  

≥ T

Page 54: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  given  a  collecMon  of  graphs    return  sub-­‐graphs  which  occur  in                              graphs      

Frequent  substructure  mining  

54  

≥ T

Page 55: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Frequent  subgraphs  presence/absence  as  feature  –  very  common:  “infrastructural”  stuff  –  somewhat  common:  different  usage  modes  

Use  case:  understanding  internal  API  calls  

55  

       

Request  

       

Auth  

       

Cache  

       

Shadow  path  

Standard     1   0   0  

OpMmized    

1   1   0  

“Shadowed”   1   0   1  

Page 56: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Feature-­‐based  graph  mining  strategy  

1.  Determine  your  goal     •  ID  unusual  access  

  Domain  knowledge  

Step   Example  

Page 57: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Feature-­‐based  graph  mining  strategy  

1.  Determine  your  goal    2.  Build  graph  representaMon  

•  ID  unusual  access  •  Remote  access  graph  

  Domain  knowledge    Graph  mining  

Step   Example  

...  

...  

Page 58: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Feature-­‐based  graph  mining  strategy  

1.  Determine  your  goal    2.  Build  graph  representaMon  3.  Frame  quesMon  graphically  

•  ID  unusual  access  •  Remote  access  graph  •  High  out-­‐degree?  

  Domain  knowledge    Graph  mining  

Step   Example  

...  

...  

Page 59: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Feature-­‐based  graph  mining  strategy  

1.  Determine  your  goal    2.  Build  graph  representaMon  3.  Frame  quesMon  graphically  4.  “Featurize”  graph  element(s)  

•  ID  unusual  access  •  Remote  access  graph  •  High  out-­‐degree?  •  Node  è  Out-­‐degree  

  Domain  knowledge    Graph  mining    Stats  /  ML  /  data  mining  

Step   Example  

Node   Out  Degree  

A   2  

B   0  

C   76  

...   ...  

Page 60: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Feature-­‐based  graph  mining  strategy  

1.  Determine  your  goal    2.  Build  graph  representaMon  3.  Frame  quesMon  graphically  4.  “Featurize”  graph  element(s)  5.  Apply  modeling  to  features  

•  ID  unusual  access  •  Remote  access  graph  •  High  out-­‐degree?  •  Node  è  Out-­‐degree  •  Fit  parametric  model  

  Domain  knowledge    Graph  mining    Stats  /  ML  /  data  mining  

Step   Example  

Page 61: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Acknowledgements,  etc  

61  

Team:  Jack  Cheng,  MarMn  Castellanos,  Leo  Gau,  Yuchen  Zhao,  Ariel  Smoliar    

Page 62: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Acknowledgements,  etc  

62  

Team:  Jack  Cheng,  MarMn  Castellanos,  Leo  Gau,  Yuchen  Zhao,  Ariel  Smoliar    

We’re  selling!  

Page 63: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Acknowledgements,  etc  

63  

Team:  Jack  Cheng,  MarMn  Castellanos,  Leo  Gau,  Yuchen  Zhao,  Ariel  Smoliar    

We’re  selling!  

We’re  recruiMng!  

Page 64: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Alternate  approach:  spectral  clustering    Services  architecture  graph  

Use  case:  understanding  internal  API  calls  

64  

Page 65: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

  Alternate  approach:  spectral  clustering    Services  architecture  graph  

Use  case:  understanding  internal  API  calls  

65  

Page 66: Graph mining for log data - David Andrzejewski...Paths/reachability! Subgraphs! Degree! Graphs!! 10 1 3 2 2 Nodes! Edges – undirected! – directed! Components! Paths/reachability!

Use  case:  customer  behavior  modeling  

  IDEA:  treat  visits  as  graphs  –  features:  node,  edge,  graph!  –  labels:  did  they  signup  /  convert  /  etc?