49
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU

Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Anomaly Detection in Large Graphs

Christos Faloutsos CMU

Page 2: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Thank you!

Prof. David Brumley

Megan Kearns

CyLab, CMU (c) 2016, C. Faloutsos 2

Page 3: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 3

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

CyLab, CMU

Page 4: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 4

Graphs - why should we care?

Internet Map [lumeta.com]

CyLab, CMU

computer network security: •  Email traffic •  IP traffic (src, dst, dst-port, t)

Malware propagation •  (machine-id, infected-file-id)

Page 5: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 5

Graphs - why should we care?

>$10B; ~1B users

CyLab, CMU

Page 6: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Motivating problems •  P1: patterns? Fraud detection?

•  P2: patterns in time-evolving graphs / tensors

CyLab, CMU (c) 2016, C. Faloutsos 8

time

destination

Page 7: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

CyLab, CMU (c) 2016, C. Faloutsos 11

Part 1: Patterns, &

fraud detection

Page 8: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 12

Laws and patterns •  Q1: Are real graphs random?

CyLab, CMU

Page 9: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 13

Laws and patterns •  Q1: Are real graphs random? •  A1: NO!!

– Diameter (‘6 degrees’; ‘Kevin Bacon’) –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

CyLab, CMU

Page 10: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 23

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

CyLab, CMU

Page 11: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 24

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns? –  2x the friends, 2x the triangles ?

CyLab, CMU

Page 12: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 25

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

CyLab, CMU

Page 13: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

26 CyLab, CMU 26 (c) 2016, C. Faloutsos

Page 14: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

27 CyLab, CMU 27 (c) 2016, C. Faloutsos

Page 15: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

28 CyLab, CMU 28 (c) 2016, C. Faloutsos

Page 16: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

29 CyLab, CMU 29 (c) 2016, C. Faloutsos

Page 17: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 55

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

CyLab, CMU

Page 18: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

CyLab, CMU (c) 2016, C. Faloutsos 56

Part 2: Time evolving

graphs; tensors

Page 19: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 57

smith

Page 20: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 58

Page 21: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 59

Mon Tue

Page 22: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 60 callee

caller

Page 23: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2’:

– Given author-keyword-date – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 61 keyword

author

MANY more settings, with >2 ‘modes’

Page 24: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2’’:

– Given subject – verb – object facts – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 62 object

subject

MANY more settings, with >2 ‘modes’

Page 25: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Graphs over time -> tensors! •  Problem #2’’’:

– Given <triplets> – Find patterns / anomalies

CyLab, CMU (c) 2016, C. Faloutsos 63 mode2

mode1

MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

Page 26: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 64

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

–  Intro to tensors – Results – Speed

•  Conclusions

CyLab, CMU

Page 27: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS Answer to both: tensor

factorization •  Recall: (SVD) matrix factorization: finds

blocks

CyLab, CMU (c) 2016, C. Faloutsos 65

N users

M products

‘meat-eaters’ ‘steaks’

‘vegetarians’ ‘plants’

‘kids’ ‘cookies’

~ + +

Page 28: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS Answer to both: tensor

factorization •  PARAFAC decomposition

CyLab, CMU (c) 2016, C. Faloutsos 66

= + + subject

object

politicians artists athletes

Page 29: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Answer: tensor factorization •  PARAFAC decomposition •  Results for who-calls-whom-when

–  4M x 15 days

CyLab, CMU (c) 2016, C. Faloutsos 67

= + + caller

callee

?? ?? ??

Page 30: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

CyLab, CMU 68 (c) 2016, C. Faloutsos

=

Page 31: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day! CyLab, CMU 69 (c) 2016, C. Faloutsos

=

1 caller 5 receivers 4 days of activity

Page 32: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day! CyLab, CMU 70 (c) 2016, C. Faloutsos

=

Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

Page 33: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 76

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

–  Inter-arrival time patterns •  Acknowledgements and Conclusions

CyLab, CMU

Page 34: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

RSC: Mining and Modeling Temporal Activity in Social Media

Alceu F. Costa* Yuto Yamaguchi Agma J. M. Traina

Caetano Traina Jr. Christos Faloutsos

Universidade de São Paulo

KDD 2015 – Sydney, Australia

*[email protected]

Page 35: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Reddit Dataset Time-stamp from comments 21,198 users 20 Million time-stamps

Twitter Dataset Time-stamp from tweets 6,790 users 16 Million time-stamps

Pattern Mining: Datasets

For each user we have: Sequence of postings time-stamps: T = (t1, t2, t3, …) Inter-arrival times (IAT) of postings: (∆1, ∆2, ∆3, …)

78 t1 t2 t3 t4

∆1 ∆2 ∆3

time CyLab, CMU (c) 2016, C. Faloutsos

Page 36: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed

Users can be inactive for long periods of time before making new postings

IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis)

79

105 106 10710 5

100

, IAT (seconds)

CC

DF,

P(x

)

1 day 10 days105 106 107

10 5

100

, IAT (seconds)

CC

DF,

P(x

)

1 day10 days

Reddit Users Twitter Users CyLab, CMU (c) 2016, C. Faloutsos

Page 37: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Pattern Mining Pattern 2: Bimodal IAT distribution

Users have highly active sections and resting periods

Log-binned histogram of postings IAT

80 Twitter Users

102 104 1060

0.005

0.01

0.015

, IAT (seconds)

PDF

1st Mode (1min) 2nd Mode (3h)

CyLab, CMU (c) 2016, C. Faloutsos

Page 38: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Human? Robots?

log

linear

CyLab, CMU (c) 2016, C. Faloutsos 81

Page 39: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Human? Robots?

log

linear 2’ 3h 1day

CyLab, CMU (c) 2016, C. Faloutsos 82

Page 40: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Experiments: Can RSC-Spotter Detect Bots?

Precision vs. Sensitivity Curves Good performance: curve close to the top

83

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

RSC Spotter

IAT Hist.Entropy [6]Weekday Hist.

All Features

Precision > 94% Sensitivity >

70%

With strongly imbalanced

datasets

# humans >> # bots

Twitter

CyLab, CMU (c) 2016, C. Faloutsos

Page 41: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Experiments: Can RSC-Spotter Detect Bots?

Precision vs. Sensitivity Curves Good performance: curve close to the top

84

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

RSC Spotter

IAT Hist.Entropy [6]Weekday Hist.

All Features

Precision > 96% Sensitivity >

47%

With strongly imbalanced

datasets

# humans >> # bots

Reddit

CyLab, CMU (c) 2016, C. Faloutsos

Page 42: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 87

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Acknowledgements and Conclusions

CyLab, CMU

Page 43: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 88

Thanks

CyLab, CMU

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

Page 44: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 89

Cast

Akoglu, Leman

Chau, Polo

Kang, U

Hooi, Bryan

CyLab, CMU

Koutra, Danai

Beutel, Alex

Papalexakis, Vagelis

Shah, Neil

Araujo, Miguel

Song, Hyun Ah

Eswaran, Dhivya

Shin, Kijung

Page 45: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 90

CONCLUSION#1 – Big data •  Patterns Anomalies

•  Large datasets reveal patterns/outliers that are invisible otherwise

CyLab, CMU

Page 46: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 91

CONCLUSION#2 – tensors

•  powerful tool

CyLab, CMU

=

1 caller 5 receivers 4 days of activity

Page 47: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

(c) 2016, C. Faloutsos 92

References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

CyLab, CMU

Page 48: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

TAKE HOME MESSAGE:

Cross-disciplinarity

CyLab, CMU (c) 2016, C. Faloutsos 93

=

Page 49: Anomaly Detection in Large Graphschristos/TALKS/16-09-cylab/faloutsos_cylab_201… · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call data:

CMU SCS

Cross-disciplinarity

CyLab, CMU (c) 2016, C. Faloutsos 94

=

Thank you!