Upload
overton
View
33
Download
2
Embed Size (px)
DESCRIPTION
Large Graph Mining. Christos Faloutsos CMU. Thank you!. Dr. Yan Liu Dr. Jimeng Sun. Outline. Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Generators Problem#3: Scalability Conclusions. Graphs - why should we care?. Internet Map [lumeta.com]. - PowerPoint PPT Presentation
Citation preview
CMU SCS
Large Graph Mining
Christos Faloutsos
CMU
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 2
Thank you!
• Dr. Yan Liu
• Dr. Jimeng Sun
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 3
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
• Problem#2: Generators
• Problem#3: Scalability
• Conclusions
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 4
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
Friendship Network [Moody ’01]
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 5
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
• web: hyper-text graph
• ... and more:
D1
DN
T1
TM
... ...
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 6
Graphs - why should we care?
• network of companies & board-of-directors members
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic and anomaly detection
• ....
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 7
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs– Weighted graphs– Time evolving graphs
• Problem#2: Generators
• Problem#3: Scalability
• Conclusions
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 8
Problem #1 - network and graph mining
• How does the Internet look like?• How does the web look like?• What is ‘normal’/‘abnormal’?• which patterns/laws hold?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 9
Graph mining
• Are real graphs random?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 10
Laws and patterns• Are real graphs random?• A: NO!!
– Diameter– in- and out- degree distributions– other (surprising) patterns
• So, let’s look at the data • (‘it is amazing what you hear, when you
listen’)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 11
Solution# S.1
• Power law in the degree distribution [SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 12
Solution# S.2: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 13
Solution# S.2: Eigen Exponent E
• [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 14
But:
How about graphs from other domains?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 16
More power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(in-degree)
log(count)
Zipf
userssites
``ebay’’
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 17
epinions.com• who-trusts-whom
[Richardson + Domingos, KDD 2001]
(out) degree
count
trusts-2000-people user
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 18
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs– Time evolving graphs
• Problem#2: Generators
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 19
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 20
Triangle ‘Laws’
• Real social networks have a lot of triangles– Friends of friends are friends
• Any patterns?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 21
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
ASNHEP-TH
Epinions X-axis: # of Triangles a node participates inY-axis: count of such nodes
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 22
Triangle Law: #S.4 [Tsourakakis ICDM 2008]
SNReuters
EpinionsX-axis: degreeY-axis: mean # trianglesNotice: slope ~ degree exponent (insets)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 23
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute(3-way join; several approx. algos)
Q: Can we do that quickly?
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 24
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute(3-way join; several approx. algos)
Q: Can we do that quickly?A: Yes!
#triangles = 1/6 Sum ( i3 )
(and, because of skewness, we only need the top few eigenvalues!
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 25
Triangle Law: Computations [Tsourakakis ICDM 2008]
1000x+ speed-up, high accuracy
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 26
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs– Time evolving graphs
• Problem#2: Generators
CMU SCS
Large Human Communication NetworksPatterns and a Utility-Driven Generator
Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu
KDD 2009
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 28
Cliques• Clique is a complete subgraph.• If a clique can not be
contained by any largerclique, it is called the maximal clique.
20
1 3
4
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 29
Clique• Clique is a complete subgraph.• If a clique can not be
contained by any largerclique, it is called the maximal clique.
20
1 3
4
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 30
Clique• Clique is a complete subgraph.• If a clique can not be
contained by any largerclique, it is called the maximal clique.
20
1 3
4
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 31
Clique• Clique is a complete subgraph.• If a clique can not be
contained by any largerclique, it is called the maximal clique.
• {0,1,2}, {0,1,3}, {1,2,3}{2,3,4}, {0,1,2,3} are cliques;
• {0,1,2,3} and {2,3,4} are the maximal cliques.
20
1 3
4
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 32
S.5: Clique-Degree Power-Law
• Power law:
More friends, even more social circles !
# maximalcliques of node i
degreeof node i
Dataset: who-calls-whomanonymized,Over several time units€
C∝da
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 33
S.5 Clique-Degree Power-Law
• Outlier Detection
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 34
1.5 Clique-Degree Power-Law
• Outlier Detection
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 35
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs
• degree, diameter, eigen*,
• triangles
• cliques
– Weighted graphs– Time evolving graphs
• Problem#2: Generators
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 36
Observation W.1
Question : Nodes in a triangle are topologically equivalent. Will they also give equal number of phone calls to each other ?
Max
Wei
ght M
in Weight
Mid Weight
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 37
Observation W.1:Triangle Weight Law
€
Max ∝mediuma
Periods S1 – S3
€
Max ∝minb
€
medium∝minc
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 38
Other observations on weighted graphs?
• A: yes - even more ‘laws’!
M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 39
Observation W.2: fortification
Q: How do the weights
of nodes relate to degree?
CMU SCS
Edges (# donors)
In-weights($)
ICDM-LDMTA 2009 C. Faloutsos 40
Observation W.2: fortification:Snapshot Power Law
• weight of a node is super-linear on the in-degree with PL exponent ‘iw’:
– i.e. 1.01 < iw < 1.26, super-linear
Orgs-Candidates
e.g. John Kerry, $10M received,from 1K donors
More donors, even more $
$10
$5
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 41
Are there deviations from P.L.?
• A: yes – but they also correspond to very skewed distributions (‘black/gray swans’)
CMU SCS
“Mobile Call Graphs: Beyond Power Law and Lognormal Distributions”
K D D ’ 0 8
Mukund Seshadri , Sridhar Machiraju,
Ashwin Sridharan, Jean Bolot
Christos Faloutsos, Jure Leskovec
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 43
Dataset• Who-calls-whom (anonymized)
• Degree distribution, in green
Area S1Time period T11 monthCount
Degree of Mobile-phone call graph
.... Observed Data
--- Power Law Fit+ Observed Data
.... LogNormal Fit
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 44
A poor fit?• Power Laws: p(x) ~ x^(-a)
• Lognormal: log(x) is Normal
Count
Degree of Mobile-phone call graph
.... Observed Data
--- Power Law Fit+ Observed Data
.... LogNormal Fit
Area S1Time period T11 month
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 45
Solution: DPLN • Double Pareto Log Normal (Reed, 2003)
• 4 parameters: [α,β,ν,τ]• Linear head and tail.
Degree
Area S1, Time Period T1
β
α
“Rich get Richer”BUT
Population lifetimesNOT identical
count
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 46
Datasets• Anonymized monthly aggregates of call metrics
per user – Collected at 4 coverage areas S1,S2,S3,S4– Collected for two month-long periods T1 and
T2, separated by 6 months – Total coverage:
• >1 million users
• >10 million calls
• ~7000 square miles.
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 47
• Degree (No. of Call Partners)• Calls• Talk Time
Total over a month,per user
Area S1Time-period T1Degree Distribution
Metrics
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 48
Persistence –
Across Metric, Time, Space
Partners
S2, T2
Partners
S1, T2
CallsArea S1, Time T1
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 49
• Outlier detection– Many un-answered calls => multiples
of 27 sec!
Applications
Per-user distribution of total talk time (@ T1, S1)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 50
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs – Weighted graphs– Time evolving graphs
• Problem#2: Generators
• …
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 51
Problem: Time evolution• with Jure Leskovec (CMU ->
Stanford)
• and Jon Kleinberg (Cornell – sabb. @ CMU)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 52
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)
• What is happening in real data?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 53
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 54
T.1 Diameter – “Patents”
• Patent citation network
• 25 years of data
time [years]
diameter
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 55
T.2 Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose thatN(t+1) = 2 * N(t)
• Q: what is your guess for E(t+1) =? 2 * E(t)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 56
T.2 Temporal Evolution of the Graphs
• N(t) … nodes at time t• E(t) … edges at time t• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for E(t+1) =? 2 * E(t)
• A: over-doubled!– But obeying the ``Densification Power Law’’
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 57
T.2 Densification – Patent Citations
• Citations among patents granted
• 1999– 2.9 million nodes– 16.5 million
edges
• Each year is a datapoint N(t)
E(t)
1.66
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 58
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs– Static graphs – Weighted graphs– Time evolving graphs
• Problem#2: Generators
• …
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 59
More on Time-evolving graphs
M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 60
Observation T.3: NLCC behaviorQ: How do NLCC’s emerge and join with
the GCC?
(``NLCC’’ = non-largest conn. components)
– Do they continue to grow in size?
– or do they shrink?
– or stabilize?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 61
Observation T.4: NLCC behavior
• After the gelling point, the GCC takes off, but NLCC’s remain ~constant (actually, oscillate).
IMDB
CC size
Time-stamp
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 62
T.5 How do new edges appear?
[LBKT’08] Microscopic Evolution of Social Networks Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew Tomkins. (ACM KDD), 2008.
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 63
Edge gap δ(d): inter-arrival time between dth and d+1st edge
T.5 How do edges appear in time?[LBKT’08]
What is the PDF of Poisson?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 64
)()(),);(( dg eddp
Edge gap δ(d): inter-arrival time between dth and d+1st edge
T.5 How do edges appear in time?[LBKT’08]
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 65
Similarly for Blogs
• with Mary McGlohon (CMU)
• Jure Leskovec (CMU->Stanford)
• Natalie Glance (now at Google)
• Mat Hurst (now at MSR)
[SDM’07]
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 66
T.6 : popularity over time
Post popularity drops-off – exponentially?
lag: days after post
# in links
1 2 3
@t
@t + lag
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 67
T.6 : popularity over time
Post popularity drops-off – exponentially?POWER LAW!Exponent?
# in links(log)
1 2 3 days after post(log)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 68
T.6 : popularity over time
Post popularity drops-off – exponentially?POWER LAW!Exponent? -1.6 • close to -1.5: Barabasi’s stack model• and like the zero-crossings of a random walk
# in links(log)
1 2 3
-1.6
days after post(log)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 69
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
• Problem#2: Generators
• Problem#3: Scalability
• Conclusions
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 70
Problem#2: Generation
Goal: Generate a realistic sequence of graphs that
1. obey all the patterns• including weights on the edges and• timestamps on the edges
2. needs no randomness
3. needs no parameters (or as few as possible)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 71
PaC generator
• Why do we need generators?– What-if scenarios
– How to attract/retain customers/members
– Stress-testing algorithms, when real graphs are unavailable (eg., due to privacy, national/corporate security, etc)
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 72
(NUMEROUS earlier generators)
• preferential attachment [Barabasi+]
• copying model [Kleinberg+]
• winner does not take all [Giles+]
• Kronecker graphs
• ...
• (survey: [Chakrabarti+, ’06])
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 73
Problem#2: Generation
Goal: Generate a realistic sequence of graphs that
1. obey all the patterns• including weights on the edges and
• timestamps on the edges
2. needs no randomness
3. needs no parameters (or as few as possible)
Proposed ‘PaC’ model – idea:• design ‘agents’, who try to max utility of
contacts
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 74
PaC Model• People benefit from calling each other.
• A Pay and Call game = PaC Model
• The payoffs are measured as “emotional dollars”.
agent
1iF 0iF
Prob pl to stay in the game
Friendliness 0<=Fi<=1
capital
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 75
Outline of Agent Behavior• Step 1: decide to stay (PL)
• Step 2: if stay, call the most profitable person(s)– Existing friend (‘exploit’)
– Stranger (‘explore’) or ask for recommendation (if available) to maximize benefits
Exponential lifetime
Rich get richer
Closing Triangle
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 77
Validation of PaC
• Ran 35 simulations
• 100,000 agents per simulation
• Variation of the parameters does not change the shape of the distribution
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 78
Goals of Validation
• G1: Skewed degree/node weight distribution
• G2: Snapshot Power-Law
• G3: Skewed connected components distribution
• G4: Clique-Degree Power-Law
• G5: Clique-Participation Law
• G6: Triangle Weight Law
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 79
Validation of PaC• G1: Skewed Degree / Node Weight Distribution
Real Network
Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 80
Validation of PaC• G2: Snapshot Power Law [McGlohon, Akoglu,
Faloutsos 08] “more partners, even more calls”
Real Network Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 81
Validation of PaC• G3: Skewed distribution of the connected
components
Real Network Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 82
Validation of PaC
• G4: Clique Degree Power Law
Real Network Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 83
Validation of PaC
• G5: Clique Participation Law
Real Network Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 84
Validation of PaC
• G6: Triangle Weight Law
Real Network
Synthetic Network
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 85
Validation of PaCG1: Skewed degree/node weight
distributionG2: Snapshot Power-LawG3: Skewed connected components
distributionG4: Clique-Degree Power-LawG5: Clique-Participation LawG6: Triangle Weight Law
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 86
Outline
• Introduction – Motivation
• Problem#1: Patterns in graphs
• Problem#2: Generators
• Problem#3: Scalability
• Conclusions
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 87
Scalability
• How about if graph/tensor does not fit in core?
• How about handling huge graphs?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 88
Scalability
• How about if graph/tensor does not fit in core?
• [‘MET’: Kolda, Sun, ICMD’08, best paper award]
• How about handling huge graphs?
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 89
Scalability• Google: > 450,000 processors in clusters of ~2000
processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]
• Yahoo: 5Pb of data [Fayyad, KDD’07]• Problem: machine failures, on a daily basis• How to parallelize data mining tasks, then?• A: map/reduce – hadoop (open-source clone)
http://hadoop.apache.org/
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 90
2’ intro to hadoop
• master-slave architecture; n-way replication (default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)• e.g, find histogram of word frequency
– compute local histograms– then merge into global histogram
select course-id, count(*)from ENROLLMENTgroup by course-id
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 91
2’ intro to hadoop
• master-slave architecture; n-way replication (default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)• e.g, find histogram of word frequency
– compute local histograms– then merge into global histogram
select course-id, count(*)from ENROLLMENTgroup by course-id map
reduce
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 92
UserProgram
Reducer
Reducer
Master
Mapper
Mapper
Mapper
fork fork fork
assignmap
assignreduce
readlocalwrite
remote read,sort
Output
File 0
Output
File 1
write
Split 0Split 1Split 2
Input Data(on HDFS)
By default: 3-way replication;Late/dead machines: ignored, transparently (!)
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 93
PEGASUS: A Peta-Scale Graph Mining System, ICDM’09
U Kang, Charalampos Tsourakakis, C. Faloutsos
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 94
the PageRank distribution: power-law
Analysis of Real Networks
YahooWeb: 1.4B nodes and 6.6B edges, 120Gb(largest public graph analyzed)
M45 (Y!):• 1.5Pb disk• 3Tb RAM• 4000 cores• among top 500
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 95
Spikes?
Analysis of Real Networks
YahooWeb: 1.4 B nodes and 6.6 B edges -Connected components:
size
count
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 96
Spikes: due to replications of pages.
Analysis of Real Networks
YahooWeb: 1.4 B nodes and 6.6 B edges -Connected components:
size
count
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 97
Evolution of connected components of LinkedIn: stable, *after* the gelling point (@2004)
LinkedIn: 7.5M nodes and 58M edges
details
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 98
OVERALL CONCLUSIONS – low level:
• Several new patterns (fortification, triangle-laws, etc)
• New generator: ‘PaC’ (agent based)
• Scalability / cloud computing -> PetaBytes
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 99
OVERALL CONCLUSIONS – high level
• good news: unprecedented opportunities (huge datasets, available on-line)
• better news: cross-disciplinary work:
– DB/sys -> scalability
– ML/stat -> tools
– Econ, Soc -> experience; what questions to ask
– phys, math -> phase transitions, tensors, eigenvalues
– …
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 100
References• Hanghang Tong, Christos Faloutsos, and Jia-Yu
Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong.
• Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA
• T. G. Kolda and J. Sun. Scalable Tensor Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 101
References• Jure Leskovec, Jon Kleinberg and Christos
Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).
• Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005.
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 102
References• Jure Leskovec and Christos Faloutsos, Scalable
Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA
• Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8-12, 2007.
• Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 103
References• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf]
• Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 104
Contact info:
www. cs.cmu.edu /~christos
Thanks again to the organizers:
Jimeng Sun, Yan LiuFunding sources:• NSF IIS-0705359, IIS-0534205, DBI-0640543• LLNL, PITA, IBM, INTEL, HP