Anomaly detection in large graphs · 2021. 5. 20. · ICWE 2021 Christos Faloutsos 31 •Mary...

Preview:

Citation preview

CMU SCS

Anomaly detection in large graphs

Christos FaloutsosCMU

CMU SCS

Thank you!

• Prof. Richard Chbeir

• Prof. Flavius Frasincar

ICWE 2021 Christos Faloutsos 2

CMU SCS

Christos Faloutsos 3

Roadmap

• Introduction – Motivation– Why study (big) graphs?

• Part#1: Patterns in graphs• Part#2: time-evolving graphs; tensors• Conclusions

ICWE 2021

CMU SCS

Christos Faloutsos 4

Graphs - why should we care?

>$10B; ~1B users

ICWE 2021

CMU SCS

Christos Faloutsos 5

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

ICWE 2021

CMU SCS

Christos Faloutsos 6

Graphs - why should we care?• web-log (‘blog’) news propagation• computer network security: email/IP traffic and

anomaly detection• Recommendation systems• ....

• Many-to-many db relationship -> graph

ICWE 2021

CMU SCS

Motivating problems• P1: patterns? Fraud detection?

• P2: patterns in time-evolving graphs / tensors

ICWE 2021 Christos Faloutsos 7

timesource

destination

CMU SCS

Motivating problems• P1: patterns? Fraud detection?

• P2: patterns in time-evolving graphs / tensors

ICWE 2021 Christos Faloutsos 8

timesource

destination

Patterns anomalies

CMU SCS

Christos Faloutsos 9

Roadmap

• Introduction – Motivation– Why study (big) graphs?

• Part#1: Patterns & fraud detection• Part#2: time-evolving graphs; tensors• Conclusions

ICWE 2021

CMU SCS

ICWE 2021 Christos Faloutsos 10

Part 1:Patterns, &

fraud detection

CMU SCS

Christos Faloutsos 11

Laws and patterns• Q1: Are real graphs random?

ICWE 2021

CMU SCS

Christos Faloutsos 12

Laws and patterns• Q1: Are real graphs random?• A1: NO!!

– Diameter (‘6 degrees’; ‘Kevin Bacon’)– in- and out- degree distributions– other (surprising) patterns

• So, let’s look at the data

ICWE 2021

CMU SCS

Christos Faloutsos 13

Solution# S.1

• Power law in the degree distribution [Faloutsos x 3 SIGCOMM99]

log(rank)

log(degree)

internet domains

att.com

ibm.com

ICWE 2021

CMU SCS

Christos Faloutsos 14

Solution# S.1

• Power law in the degree distribution [Faloutsos x 3 SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

ICWE 2021

CMU SCS

15

S2: connected component sizes• Connected Components – 4 observations:

Size

Count

Christos FaloutsosICWE 2021

1.4B nodes6B edges

CMU SCS

16

S2: connected component sizes• Connected Components

Size

Count

Christos FaloutsosICWE 2021

1) 10K x largerthan next

CMU SCS

17

S2: connected component sizes• Connected Components

Size

Count

Christos FaloutsosICWE 2021

2) ~0.7B singletonnodes

CMU SCS

18

S2: connected component sizes• Connected Components

Size

Count

Christos FaloutsosICWE 2021

3) SLOPE!

CMU SCS

19

S2: connected component sizes• Connected Components

Size

Count300-size

cmptX 500.Why?1100-size cmpt

X 65.Why?

Christos FaloutsosICWE 2021

4) Spikes!

CMU SCS

20

S2: connected component sizes• Connected Components

Size

Count

suspiciousfinancial-advice sites

(not existing now)

Christos FaloutsosICWE 2021

CMU SCS

Christos Faloutsos 21

Roadmap

• Introduction – Motivation• Part#1: Patterns in graphs

– P1.1: Patterns: Degree; Triangles– P1.2: Anomaly/fraud detection

• Part#2: time-evolving graphs; tensors• Conclusions

ICWE 2021

CMU SCS

Christos Faloutsos 22

Solution# S.3: Triangle ‘Laws’

• Real social networks have a lot of triangles

ICWE 2021

CMU SCS

Christos Faloutsos 23

Solution# S.3: Triangle ‘Laws’

• Real social networks have a lot of triangles– Friends of friends are friends

• Any patterns?– 2x the friends, 2x the triangles ?

ICWE 2021

CMU SCS

Christos Faloutsos 24

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

SNReuters

Epinions X-axis: degreeY-axis: mean # trianglesn friends -> ~n1.6 triangles

ICWE 2021

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

25ICWE 2021 25Christos Faloutsos

? ?

?

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

26ICWE 2021 26Christos Faloutsos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

27ICWE 2021 27Christos Faloutsos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

28ICWE 2021 28Christos Faloutsos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

29ICWE 2021 29Christos Faloutsos

CMU SCS

MORE Graph Patterns

ICWE 2021 Christos Faloutsos 30

✔✔✔

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

CMU SCS

MORE Graph Patterns

ICWE 2021 Christos Faloutsos 31

• Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)

• Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.

CMU SCS

Christos Faloutsos 32

Roadmap

• Introduction – Motivation• Part#1: Patterns in graphs

– P1.1: Patterns– P1.2: Anomaly / fraud detection

• No labels – spectral• With labels: Belief Propagation

• Part#2: time-evolving graphs; tensors• Conclusions

ICWE 2021

Patterns anomalies

CMU SCS

How to find ‘suspicious’ groups?• ‘blocks’ are normal, right?

ICWE 2021 Christos Faloutsos 33

fans

idols

CMU SCS

Except that:• ‘blocks’ are normal, right?• ‘hyperbolic’ communities are more realistic

[Araujo+, PKDD’14]

ICWE 2021 Christos Faloutsos 34

CMU SCS

Except that:• ‘blocks’ are usually suspicious• ‘hyperbolic’ communities are more realistic

[Araujo+, PKDD’14]

ICWE 2021 Christos Faloutsos 35

Q: Can we spot blocks, easily?

CMU SCS

Except that:• ‘blocks’ are usually suspicious• ‘hyperbolic’ communities are more realistic

[Araujo+, PKDD’14]

ICWE 2021 Christos Faloutsos 36

Q: Can we spot blocks, easily?A: Silver bullet: SVD!

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 37

N fans

Midols

‘music lovers’‘singers’

‘sports lovers’‘athletes’

‘citizens’‘politicians’

~ + +

DETAILS

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 38

N users

Mproducts

‘meat-eaters’‘steaks’

‘vegetarians’‘plants’

‘kids’‘cookies’

~ + +

DETAILS

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 39

~ + +

DETAILS

Mtimestamps

‘cancer’ ‘alzheimer’ ‘Parkinson’

N genes

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 40

~ + +

DETAILS

Mtimestamps

‘hurricane’ ‘cold-spell’ ‘heat-wave’

N locations

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 41

N fans

Midols

‘music lovers’‘singers’

‘sports lovers’‘athletes’

‘citizens’‘politicians’

~ + +

DETAILS

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 42

N fans

Midols

‘music lovers’‘singers’

‘sports lovers’‘athletes’

‘citizens’‘politicians’

~ + +

DETAILS

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 43

N fans

Midols

‘music lovers’‘singers’

‘sports lovers’‘athletes’

‘citizens’‘politicians’

~ + +

DETAILS

Even if shuffled!

CMU SCS

Inferring Strange Behavior fromConnectivity Pattern in Social Networks

PAKDD’14

Meng Jiang, Peng Cui, Shiqiang Yang (Tsinghua)Alex Beutel, Christos Faloutsos (CMU)

CMU SCS

Dataset

• Tencent Weibo• 117 million nodes (with profile and UGC

data)• 3.33 billion directed edges

ICWE 2021 45Christos Faloutsos

CMU SCS

Real Data

“Pearls” “Staircase”

“Rays” “Block”

ICWE 2021 46Christos Faloutsos

CMU SCS

Real Data• Spikes on the out-degree distribution

´

´ICWE 2021 47Christos Faloutsos

CMU SCS

Christos Faloutsos 48

Roadmap

• Introduction – Motivation• Part#1: Patterns in graphs

– P1.1: Patterns– P1.2: Anomaly / fraud detection

• No labels – spectral methods• With labels: Belief Propagation

• Part#2: time-evolving graphs; tensors• Conclusions

ICWE 2021

CMU SCS

ICWE 2021 Christos Faloutsos 49

E-bay Fraud detection

w/ Polo Chau &Shashank Pandit, CMU[www’07]

CMU SCS

ICWE 2021 Christos Faloutsos 50

E-bay Fraud detection

CMU SCS

ICWE 2021 Christos Faloutsos 51

E-bay Fraud detection

CMU SCS

ICWE 2021 Christos Faloutsos 52

E-bay Fraud detection - NetProbe

CMU SCS

Popular press

And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of

your code?’)ICWE 2021 Christos Faloutsos 53

CMU SCS

Summary of Part#1• *many* patterns in real graphs

– Power-laws everywhere– Long (and growing) list of tools for

anomaly/fraud detection

ICWE 2021 Christos Faloutsos 54

Patterns anomalies

CMU SCS

Christos Faloutsos 55

Roadmap

• Introduction – Motivation• Part#1: Patterns in graphs• Part#2: time-evolving graphs

– P2.1: tools/tensors– P2.2: other patterns

• Conclusions

ICWE 2021

CMU SCS

ICWE 2021 Christos Faloutsos 56

Part 2:Time evolving graphs; tensors

CMU SCS

Graphs over time -> tensors!• Problem #2.1:

– Given who calls whom, and when– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 57

smith

johnson

CMU SCS

Graphs over time -> tensors!• Problem #2.1:

– Given who calls whom, and when– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 58

CMU SCS

Graphs over time -> tensors!• Problem #2.1:

– Given who calls whom, and when– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 59

MonTue

CMU SCS

Graphs over time -> tensors!• Problem #2.1:

– Given who calls whom, and when– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 60callee

caller

time

CMU SCS

Graphs over time -> tensors!• Problem #2.1’:

– Given author-keyword-date– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 61keyword

author

date

MANY more settings,with >2 ‘modes’

CMU SCS

Graphs over time -> tensors!• Problem #2.1’’:

– Given subject – verb – object facts– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 62object

subject

verb MANY more settings,

with >2 ‘modes’

CMU SCS

Graphs over time -> tensors!• Problem #2.1’’’:

– Given <triplets>– Find patterns / anomalies

ICWE 2021 Christos Faloutsos 63mode2

mode1mod

e3MANY more settings,with >2 ‘modes’(and 4, 5, etc modes)

CMU SCS

Answer : tensor factorization• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 64

N users

Mproducts

‘meat-eaters’‘steaks’

‘vegetarians’‘plants’

‘kids’‘cookies’

~ + +

CMU SCS

Crush intro to SVD• Recall: (SVD) matrix factorization: finds

blocks

ICWE 2021 Christos Faloutsos 65

N fans

Midols

‘music lovers’‘singers’

‘sports lovers’‘athletes’

‘citizens’‘politicians’

~ + +

CMU SCS

Answer: tensor factorization• PARAFAC decomposition

ICWE 2021 Christos Faloutsos 66

= + +subject

object

verb

politicians artists athletes

CMU SCS

Answer: tensor factorization• PARAFAC decomposition• Results for who-calls-whom-when

– 4M x 15 days

ICWE 2021 Christos Faloutsos 67

= + +caller

callee

time

?? ?? ??

CMU SCS

Anomaly detection in time-evolving graphs

• Anomalous communities in phone call data:– European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

ICWE 2021 68Christos Faloutsos

=

CMU SCS

Anomaly detection in time-evolving graphs

• Anomalous communities in phone call data:– European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

ICWE 2021 69Christos Faloutsos

=

CMU SCS

Anomaly detection in time-evolving graphs

• Anomalous communities in phone call data:– European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!ICWE 2021 70Christos Faloutsos

=

Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann,Christos Faloutsos, Prithwish Basu, Ananthram Swami,Evangelos Papalexakis, Danai Koutra. Com2: FastAutomatic Discovery of Temporal (Comet) Communities.PAKDD 2014, Tainan, Taiwan.

CMU SCS

Christos Faloutsos 71

Roadmap

• Introduction – Motivation• Part#1: Patterns in graphs• Part#2: time-evolving graphs

– P2.1: tools/tensors– P2.2: other patterns – inter-arrival time

• Conclusions

ICWE 2021

CMU SCS

RSC: Mining and Modeling Temporal Activity in Social Media

Alceu F. Costa* Yuto Yamaguchi Agma J. M. Traina

Caetano Traina Jr. Christos Faloutsos

Universidadede São Paulo

KDD 2015 – Sydney, Australia

*alceufc@icmc.usp.br

CMU SCS

Reddit DatasetTime-stamp from comments21,198 users20 Million time-stamps

Twitter DatasetTime-stamp from tweets6,790 users16 Million time-stamps

Pattern Mining: Datasets

For each user we have: Sequence of postings time-stamps: T = (t1, t2, t3, …)Inter-arrival times (IAT) of postings: (∆1, ∆2, ∆3, …)

73t1 t2 t3 t4

∆1 ∆2 ∆3

timeICWE 2021 Christos Faloutsos

CMU SCS

Human? Robots?

log

linear

ICWE 2021 Christos Faloutsos 74

CMU SCS

Human? Robots?

log

linear2’ 3h 1day

ICWE 2021 Christos Faloutsos 75

CMU SCS

Experiments: Can RSC-Spotter Detect Bots?

Precision vs. Sensitivity CurvesGood performance: curve close to the top

76

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

RSCïSpotter

IAT Hist.Entropy [6]Weekday Hist.

All Features

Precision > 94%Sensitivity > 70%

With strongly imbalanced datasets# humans >> # bots

Twitter

ICWE 2021 Christos Faloutsos

CMU SCS

Experiments: Can RSC-Spotter Detect Bots?

Precision vs. Sensitivity CurvesGood performance: curve close to the top

77

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

Sensitivity (Recall)

Prec

isio

n

RSCïSpotter

IAT Hist.Entropy [6]Weekday Hist.

All Features

Precision > 96%Sensitivity > 47%With strongly imbalanced datasets# humans >> # bots

Reddit

ICWE 2021 Christos Faloutsos

CMU SCS

Part 2: Conclusions

• Time-evolving / heterogeneous graphs -> tensors

• PARAFAC finds patterns• Surprising temporal patterns

ICWE 2021 78Christos Faloutsos

=

CMU SCS

Christos Faloutsos 79

Roadmap

• Introduction – Motivation– Why study (big) graphs?

• Part#1: Patterns in graphs• Part#2: time-evolving graphs; tensors• Acknowledgements and Conclusions

ICWE 2021

CMU SCS

Christos Faloutsos 80

Thanks

ICWE 2021

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

CMU SCS

Christos Faloutsos 81

Cast

Akoglu, Leman

Chau, Polo

Kang, U

Hooi,Bryan

ICWE 2021

Koutra,Danai

Beutel,Alex

Papalexakis,Vagelis

Shah,Neil

Araujo,Miguel

Song,Hyun Ah

Eswaran,Dhivya

Shin,Kijung

CMU SCS

Christos Faloutsos 82

CONCLUSION#1 – Big data• Patterns Anomalies• Large datasets reveal patterns/outliers that

are invisible otherwise

ICWE 2021

CMU SCS

Christos Faloutsos 83

CONCLUSION#2 – tensors

• powerful tool

ICWE 2021

=

1 caller 5 receivers 4 days of activity

CMU SCS

Christos Faloutsos 84

References• D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012• http://www.morganclaypool.com/doi/abs/10.2200/S004

49ED1V01Y201209DMK006

ICWE 2021

CMU SCS

Christos Faloutsos 85

References• Danai Koutra and Christos Faloutsos, Individual and

Collective Graph Mining: Principles, Algorithms, and Applications, Morgan Claypool 2017(https://doi.org/10.2200/S00796ED1V01Y201708DMK014)

ICWE 2021

CMU SCS

TAKE HOME MESSAGE:

Cross-disciplinarity

ICWE 2021 Christos Faloutsos 86

=

CMU SCS

Cross-disciplinarity

ICWE 2021 Christos Faloutsos 87

=

Thank you!

Recommended