26
CMU SCS Large Graph Mining Christos Faloutsos CMU

Large Graph Mining

Embed Size (px)

DESCRIPTION

Large Graph Mining. Christos Faloutsos CMU. Roadmap. Introduction – Motivation Past work: Big graph mining (‘Pegasus’/hadoop) Propagation / immunization Ongoing & future work: (big) tensors brain data Conclusions. (Big) Graphs - Why study them?. Facebook [ 2010 ] >1B nodes, >$10B. - PowerPoint PPT Presentation

Citation preview

Page 1: Large Graph Mining

CMU SCS

Large Graph Mining

Christos FaloutsosCMU

Page 2: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 2

Roadmap• Introduction – Motivation• Past work:

– Big graph mining (‘Pegasus’/hadoop)– Propagation / immunization

• Ongoing & future work: – (big) tensors– brain data

• Conclusions

MLD-AB

Page 3: Large Graph Mining

CMU SCS

(Big) Graphs - Why study them?

Human Disease Network

[Barabasi 2007]

Gene Regulatory Network

[Decourty 2008]

Facebook [2010]>1B nodes, >$10B

The Internet [2005]

C. Faloutsos (CMU) 3SUM'13

Page 4: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 4

(Big) Graphs - why study them?

• web-log (‘blog’) news propagation• computer network security: email/IP traffic and

anomaly detection• Recommendation systems• ....

• Many-to-many db relationship -> graph

MLD-AB

Page 5: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 5

Roadmap• Introduction – Motivation• Past work:

– Big graph mining (‘Pegasus’/hadoop)– Propagation / immunization

• Ongoing/future: (big) tensors / brain data

• Conclusions

MLD-AB

Page 6: Large Graph Mining

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

6MLD-AB 6(c) 2013, C. Faloutsos

? ?

?

Page 7: Large Graph Mining

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

7MLD-AB 7(c) 2013, C. Faloutsos

Page 8: Large Graph Mining

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

8MLD-AB 8(c) 2013, C. Faloutsos

Page 9: Large Graph Mining

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges)[U Kang, Brendan Meeder, +, PAKDD’11]

9MLD-AB 9(c) 2013, C. Faloutsos

Page 10: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 10

Roadmap• Introduction – Motivation• Past work:

– Big graph mining (‘Pegasus’/hadoop)– Propagation / immunization

• Ongoing & future work: – (big) tensors– brain data

• Conclusions

MLD-AB

Page 11: Large Graph Mining

CMU SCS

Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong,

Christos Faloutsos

SDM 2013, Austin, TX

(c) 2013, C. Faloutsos 11MLD-AB

Page 12: Large Graph Mining

CMU SCS

Whom to immunize?• Dynamical Processes over networks

• Each circle is a hospital• ~3,000 hospitals• More than 30,000

patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

(c) 2013, C. Faloutsos 12MLD-AB

Page 13: Large Graph Mining

CMU SCS

Whom to immunize?

CURRENT PRACTICE OUR METHOD

[US-MEDICARE NETWORK 2005]

~6x fewer!

(c) 2013, C. Faloutsos 14MLD-AB

Hospital-acquired inf. : 99K+ lives, $5B+ per year

Page 14: Large Graph Mining

CMU SCS

Running Time

Simulations SMART-ALLOC

> 1 weekWall-Clock

Time≈

14 secs

> 30,000x speed-up!

better

(c) 2013, C. Faloutsos 15MLD-AB

Page 15: Large Graph Mining

CMU SCS

What is the ‘silver bullet’?A: Try to decrease connectivity of graph

Q: how to measure connectivity?A: first eigenvalue of adjacency matrix

Q1: why??

MLD-AB (c) 2013, C. Faloutsos 16

Avg degreeMax degreeDiameterModularity‘Conductance’

Page 16: Large Graph Mining

CMU SCS

Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos FaloutsosIEEE ICDM 2011, Vancouver

extended version, in arxivhttp://arxiv.org/abs/1004.0060

G2 theorem

~10 pages proof

Page 17: Large Graph Mining

CMU SCS

Our thresholds for some models

• s = effective strength• s < 1 : below threshold

(c) 2013, C. Faloutsos 18MLD-AB

Models Effective Strength (s)

Threshold (tipping point)

SIS, SIR, SIRS, SEIR s = λ .

s = 1 SIV, SEIV s = λ .

(H.I.V.) s = λ .

12

221

vvv

2121 VVISI

Page 18: Large Graph Mining

CMU SCS

Our thresholds for some models

• s = effective strength• s < 1 : below threshold

(c) 2013, C. Faloutsos 19MLD-AB

Models Effective Strength (s)

Threshold (tipping point)

SIS, SIR, SIRS, SEIR s = λ .

s = 1 SIV, SEIV s = λ .

(H.I.V.) s = λ .

12

221

vvv

2121 VVISI

No immunity

Temp.immunity

w/ incubation

Page 19: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 20

Roadmap• Introduction – Motivation• Past work:

– Big graph mining (‘Pegasus’/hadoop)– Propagation / immunization

• Ongoing & future work: – (big) tensors– brain data

• Conclusions

MLD-AB

Page 20: Large Graph Mining

CMU SCS

Brain data

MLD-AB (c) 2013, C. Faloutsos 21

• Which neurons get activated by ‘bee’• How wiring evolves• Modeling epilepsy

N. Sidiropoulos

George Karypis

V. Papalexakis

Tom Mitchell

Page 21: Large Graph Mining

CMU SCS

Preliminary results• 60 words (‘bee’, ‘apple’, ‘hammer’)• 80 questions (‘is it alive’, ‘can it hurt you’)• Brain-scan, for each word

MLD-AB (c) 2013, C. Faloutsos 23

Alive? Can hurt you? …

‘apple’

‘beetle’ ✔

‘hammer’ ✔

Page 22: Large Graph Mining

CMU SCS

Preliminary results

MLD-AB (c) 2013, C. Faloutsos 24

Page 23: Large Graph Mining

CMU SCS

Preliminary results

MLD-AB (c) 2013, C. Faloutsos 25

Premotor cortex

Page 24: Large Graph Mining

CMU SCS

(c) 2013, C. Faloutsos 26

CONCLUSION#1 – Big data

• Large datasets reveal patterns/outliers that are invisible otherwise

MLD-AB

Page 25: Large Graph Mining

CMU SCS

CONCLUSION #2 – Cross disciplinarity

MLD-AB (c) 2013, C. Faloutsos 27

Page 26: Large Graph Mining

CMU SCS

CONCLUSION #2 – Cross disciplinarity

MLD-AB (c) 2013, C. Faloutsos 28

Thank you! Questions?