Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Panther: Fast Top-K Similarity Search on Large Networks
Jing Zhang1, Jie Tang1, Cong Ma1, Hanghang Tong2, Yu Jing1, and Juanzi Li1
1Department of Computer Science and TechnologyTsinghua University
2School of Computing, Informatics, and Decision Systems EngineeringArizona State University
2
Who are Similar with Barabási?
tangmunarunkit, h
galas, d
aiello, w
govindan, r
whittington, m
lu, l
ermentrout, b
shenker, s
bollt, e
ruiz, m
belew, rko, j
kawai, r
billings, l
chern, j menczer, fotsuka, k
stilwell, d
hwong, sroberson, d
pant, g srinivasan, p
sherrington, d
solomonoff, r
rapoport, a
svennenfors, b
harkany, t handcock, mhoff, p
raftery, a
horvath, w
jones, j
campbell, s
jayaprakash, ckenet, t
terman, d
wang, d
chammah, a
kesselman, c
ripeanu, m
iamnitchi, a
holmgren, c
zilberter, y
foster, i
birman, kbrown, j
vogels, w
erlebach, t
vukadinovic, d
vanrenesse, rhuang, l
yang, k
huang, penquist, b
west, g
stevens, c
bollobas, b
klyachko, v
kegels, s jurkiewicz, j
schikorski, t
bchklovskii, d
joy, m
krzywicki, a
jost, j
schiff, s
so, pzhu, h
barreto, e
zhu, j
tusnady, gburda, z
fullilove, m
hughes, b coates, tcorreia, jreed, w
riordan, o
catania, jchan, d
leong, a
bekessy, p
paturi, r
bekessy, akomlos, j
everett, myeates, tmarcotte, e
rice, d
salwinski, leisenberg, d
pello, jromance, m
criado, r
hernandezbermejo, b
garciadelamo, aflores, j
hui, p
acebron, j
ritort, f zheng, b
spigler, rtrimper, sbonilla, l
zheng, d
perezvicente, c
guevara, m
volchenkova, l
volchenkov, d
oosawa, c
yehia, a
chang, c
blanchard, p
alonso, f
jeandupreux, d
kruger, t
lewis, j
bray, dalberts, b
watson, j
raff, mroberts, k
foster, p
herrmann, j
dodel, s
borgatti, s
weeks, m
clair, s
jamin, s
poole, a
carreras, b
timme, m
lynch, v
wolf, f dolrou, i
newman, d
geisel, t
arieli, a
grinvald, a
tsodyks, m
rosajr, e schwartz, i
soares, d
mariz, a
atay, f
wende, a
spencer, j
dealbuquerque, m
dasilva, l
tsallis, c
hiavacek, wsavageau, m
wall, m
rodriguez, e
pardo, w
monti, m
varela, f
lachaux, j
ticos, c
martinerie, j
walkenstein, j
gluckmann, b
jankowski, s
huang, z
londei, a
yang, l
jia, l
mazur, clai, p
lozowski, a
sano, m
chan, c
rivest, r
gammaitoni, l
jung, p
corman, s
mcphee, r
goldberg, d
oki, b amblard, f
deffuant, g
nettle, dkroger, h
bork, phuynen, mdunbar, r
snel, b ramezanpour, aroxin, a rosato, vsimard, dstiller, j
hall, d nichols, d
weisbuch, g
neau, d
marchesoni, fterry, d
hanggi, p
mashaghi, atiriticco, fbologna, s
solla, skarimipour, vriecke, hnadeau, l
leiserson, c
stein, c
kuhn, tcormen, t
dooley, k
vogelstein, b tyson, j
lane, d csikasznage, alevine, a
murray, a
novak, b
hopfield, j
buhl, e
pauls, j
gardner, e
logothetis, n
trinath, t
augath, m
traub, r
derrida, b
oeltermann, a
crisanti, a
zippelius, a
lourenc, g
lima, gkree, r
kinouchi, o
flyvbjerg, h
risaugusman, s
berezovskaya, f
pellegrinitoole, a
martinez, a
riley, m
koonin, ekarev, g
karp, p
gelatt, c
rulkov, n
shinomoto, s tsimring, l
kuramoto, ynakao, hkirkpatrick, s
powell, w
koput, k
sushchik, m white, dvecchi, m abarbanel, h owensmith, jsakaguchi, h
leibler, s bahar, s
bienfang, jgauthier, d
bourgine, p
guelzim, n goodman, mharwell, l
avery, l
travers, j sato, t
bottani, s
kepes, f
korte, c
hall, g
takayasu, h
lockery, s
takayasu, m
milgram, s
greene, d
greve, h
hauser, c
larson, j
paley, s
demers, a
krummenacker, m
evans, m
killworth, p
mccarty, c
bernard, hshelley, g
lagofernandez, l
konno, n
miwa, h
huerta, rzanette, d
aihara, k
masuda, nsiguenza, j
corbacho, f
yoo, m
fell, d
changizi, m
stadler, pbaird, d
hattori, m
irish, w
ulanowicz, rbernard, cgleiss, p
abramson, g
herzel, h
patzak, amorelli, l
holste, d
mrowka, rpage, l
kuperman, m
winograd, t
motwani, r
brin, s
cherniak, c
faulkner, r
rodriguezesteban, rmokhtarzada, zkang, d
davis, g
baker, w
willinger, w chung, fvu, v
dewey, t
bhan, achen, q
chang, hschensul, jradda, k
hufnagel, l
brockmann, d
dobson, i sachtjen, m
sokolov, i
koopman, j
jespersen, s
xulvibrunet, r
warren, c
sander, l
simon, c
blumen, a
oster, g
hally, j
leloup, j
dupont, g
gonze, d
maltsev, n
kaiser, d
igoshin, o
houart, g
goldbeter, a
crouch, bwhite, j
clewley, r
southgate, e
keck, t
pattison, pbrenner, s
arno, s
anderson, c
netoff, tthompson, j
adar, e
puniyani, a
lukose, rhuberman, b
wilkinson, d
tyler, j
wu, f
adamic, l
li, ymansfield, t
lockshon, d
uetz, p
glot, lnarayan, v
pochart, p
conover, dkammen, d
koch, c
crick, f
niebur, e
ress, g
schuster, h
laurent, g
kreiman, g
fabiny, l
roy, r
thornburg, k
moller, m
meester, r
vanwiggeren, g
rogister, f
yang, mvijayadamodar, g
cagney, g
knight, j
giot, l
kalbfleisch, t
godwin, b
qureshiemili, asrinivasan, m
fields, s
rosa, e
hunt, b
restrepo, j
deshazer, d
hess, m
ott, e
breban, r
rohlf, t
reichardt, j
davidsen, j
bornholdt, s
ebel, h
mielsch, l
hu, g
yang, j
liu, w
zheng, z
yao, yhu, b
gao, z
haythornthwaite, c
johnston, m
dimitrova, d
wellman, b
judson, r
salaff, j
garton, l
gulia, m
rothberg, j
fink, k
heagy, j barahona, m
carroll, tjohnson, g
pecora, l
valladares, d
allaria, e
digarbo, a
zhou, cmeucci, r
arecchi, f
chavez, m
mendoza, chentschel, h
pelaez, avallone, a
vannucchi, f
bragard, jmancini, h
chate, h
freund, h
gregoire, g
tass, p
weule, m
schnitzler, a
rudzick, o
pikovsky, a
nishikawa, t
ye, n
demoura, a
motter, a
liu, z
hoppensteadt, flai, y
grebogi, c
dasgupta, p
lounasmaa, o
salmelin, r
hari, r
kujala, j
gross, j
ilmoniemi, r
knuutila, j
timmermann, l
hamalainen, m
rosenblum, m
schafer, c
zaks, m
osipov, g
park, e
volkmann, j
abel, h
kurths, j
maza, d
vegaredondo, f
guardiola, x
moreno, y
louis, e
perez, c
diazguilera, a
vragovic, i
llas, m
gomezgardenes, j boguna, mrubi, mechenique, pnekovee, m
lawrence, ssoffer, s
flammini, a
giles, c
glover, e
leone, m
flake, g
pennock, dzecchina, r
broder, a
kumar, r
vilone, d
wiener, j
dorogovtsev, s
radicchi, f
cecconi, f
parisi, d
samukhin, a
castellano, c
loreto, v
goltsev, a
pacheco, a
hwang, d
gomez, j
amann, a
lopezruiz, r
vazquezprada, m
floria, l
cieplak, m
holter, n
mitra, m
rigon, r
banavar, j
rinaldo, a
rodrigueziturbe, i
fedroff, n
maritan, a
giacometti, a
weigt, mmaghoul, f
upfal, e
vespignani, acoetzee, f
vazquez, astata, r
moukarzel, c
song, c
korniss, g
kozma, b
penna, t
toroczkai, z
danon, lguichard, e
barthelemy, m
arenas, ascala, a
moreira, a
amaral, l
camacho, j
gleiser, p
turtschi, a
giralt, f
provero, p
gondran, bguimera, r
mossa, s
cabrales, a
herrmann, c
rajagopalan, s
sivakumar, d
kumar, s
kepler, t
pastorsatorras, r
tomkins, a
raghavan, p
ramasco, j
barrat, a
kohler, r
mendes, j
janssen, c montoya, j
bassler, k
corral, a
hengartner, n
paczuski, m
baiesi, m
bonanno, g
kleinberg, j
mirollo, r
matthews, p
smith, e
buhl, j
valverde, s
theraulaz, g
defraysseix, h
garciafernandez, j
ferrericancho, r
gautrais, j
deneubourg, j
cancho, r
kuntz, p
makse, h
frauenfelder, h
vazquez, f
stroud, d
leyvraz, frozenfeld, a
bennaim, e
erez, k
antal, t
cohen, rhavlin, s
krapivsky, p
benavraham, dredner, s
dezso, z
martinez, n
kim, j
schwartz, n
berlow, e
demenezes, m
dobrin, r
williams, r
somera, a
mongru, d
dunne, j
park, y
goh, k
lee, d
jung, s
kim, s
ghim, c
oh, e
yook, s
podani, j
rho, k
kim, d
tu, y
yoon, c
kim, b
huss, m
han, s
chung, j
holme, p
hong, h
moore, c
girvan, m
loffredo, m
martin, m
schrag, s
sanwalani, v
mucha, p
salazarciudad, i
yeung, m
lusseau, dstrogatz, s
muhamad, r
hopcroft, j
gastner, m
watts, d
park, j
callaway, d
coccetti, f
servedio, v
castri, m
mantegna, rcaldarelli, g
lillo, fghoshal, g
capocci, a
pietronero, l
battiston, s
garlaschelli, d
petermannn, t
catanzaro, m
delosrios, p
hong, d
leicht, e
edling, c
colaiori, f
aberg, yliljeros, f
stanley, handrade, j
porter, m
sabel, c
clauset, a
rothman, d
http://web1.aminer.org/
dodds, p
breiger, rarabie, p
bonney, m
trotter, r
boorman, s
darrow, wzimmerman, h
maldonadolong, tmuth, j
baldwin, jphillipsplummer, l
woodhouse, d
muth, spotterat, j
klovdahl, a
overbeek, r
selkov, e
pusch, g kyrpides, n
selkovjr, e
dsouza, m
larsen, n
fonstein, m
baron, mxenarios, i
chatterjee, a
sreeram, p
dasgupta, s
sen, p
mukherjee, g
chakrabarti, b
manna, s
biswas, t
banerjee, k
nazer, n
white, h
lorrain, f
taylor, jgreen, d
rothenberg, rzimmermanroger, h
leiber, srosenberg, r
mangan, s
bashkin, p
alon, u
itzkovitz, s
song, s
koulakov, a
nelson, s
sjostrom, p
svoboda, k
chklovskii, d
reigl, m
mel, b
young, m
haga, p
payne, b
sager, j
falchier, a
baddeley, r
grant, s
vezoli, j
csardi, g
scannell, j
knoblauch, k
imbert, m
sthepan, k
rosentiehl, p
kotter, r
zwi, jjouve, b
sporns, o
passingham, r
sommer, fkennedy, h
martin, r
grant, a
blackmore, c
baliki, m
apkarian, achialvo, d
oneill, m
kaiser, m burns, g
kamper, lhilgetag, c
bozkurt, a
stephan, k
andras, p
zimmermann, m
sanmiguel, m
amengual, a
montagne, r
klemm, k
hernandezgarcia, e
suchecki, k
eguiluz, v
cecchi, g
sigman, m
tsalyuk, m
mayo, azaslaver, a
sberro, h
surette, m
ofersarig, y
bergmann, s
barkai, n
ihmels, j
friedlander, g
shenorr, smilo, r
levitt, r
sheffer, m
kashtan, n
ziv, g
ayzenshtat, i
greenbaum, d
greenblatt, j
krogan, n
snyder, m
jansen, r
yu, h
gerstein, m
emili, a
chung, s
kluger, y
mannhaupt, g
rudd, syu, x
tornow, s
chen, d
weil, bguldener, u
mewes, hmokrejs, m
munsterkotter, m
mayer, kmorgenstern, bli, x
li, c frishman, d
buzsaki, g
lu, jwang, x
henze, d
xu, jchallet, d
chen, g
geisler, c
chrobak, j
zhang, y
zhan, m
braun, t
cerdeira, h
chen, s
lee, tyoo, j
rinaldi, n
gilles, e
young, r
gerber, g
klamt, sgordon, d
barjoseph, z
schuster, s
koch, i
dandekar, t
pfeiffer, t
moldenhauer, f
bettenbrock, k stelling, jfraenkel, ejaakkola, t
moss, fhuber, m
braun, h
wojtenek, w
pei, x
voigt, k
wilkens, l
neiman, a
franceschi, c
marchiori, m
valensin, s
castellani, g
remondini, d
tieri, p
farkas, i
ravasz, e
oltvai, z
bianconi, g
schubert, a
mason, stombor, b
szathmary, e
neda, z
park, h
derenyi, i tadic, b
albert, r
albert, i
kinney, r
caruso, f
rapisarda, a
nakarado, gporta, s
tononi, g
edelman, g
mcintosh, a
russell, d
segev, r
darbydowman, k
ergun, g
czirok, a
thurner, s
ayali, a
shefi, o
benjacob, e
golding, i
cohen, i
wuchty, s
rodgers, gvicsek, t
beg, q
dovidio, f
marodi, m
macdonald, p
shochet, o
stagni, c
usai, l
pluchino, a
cosenza, s
fortuna, l
larosa, m
bucolo, m
crucitti, pspata, a
frasca, m
gorman, s
kulkarni, r
almaas, e
kovacs, b
roux, s
muren, l
dearcangelis, l
lingjiang, k
gonzales, m
sousa, a
yusong, t
fortunato, s
montuori, m
garrido, p
torres, j
eriksen, k
sneppen, k
zaliznyak, a
bak, p
simonsen, i
maslov, s
donetti, l
marro, j
costa, u
dafontouracosta, l
dickman, r
araujo, a
adler, j
bernardes, a
aharony, a
aleksiejuk, a
meyerortmanns, h
warmbrand, c
forrest, s
jin, e
balthrop, j
kalapala, v
ancelmeyers, l
fronczak, p
diambra, l
holyst, j
jedynak, m
fronczak, a
sienkiewicz, j
jarisaramaki, j
onnela, j
kertesz, j
chakraborti, a
szabo, gkaski, k
alava, m
lahtinen, j
kanto, a
trusina, a
delucia, m
bottaccio, m
choi, m
minnhagen, pherrmann, h
rosvall, m
munoz, m
gupta, slloyd, a
may, r anderson, r
fried, i
moll, c
ojemann, g
buchel, c
berg, j
friston, k
wagner, aliddle, p
coull, j
frith, classig, m
frackowiak, r
deaguilar, s
lucena, l
delimaesilva, d
schivanialves, m
corso, g
henriques, m
medeirossoares, m
decarvalho, t
sakaki, y
yoshida, m
ozawa, rtaylor, w
krause, a
mason, dchiba, t
frank, kito, t
cordes, d
haughton, vturski, p
carew, j
quigley, m
meyerand, marfanakis, k
moritz, c
fu, z
wang, b
yan, g
zhou, t
wang, j
zhang, f
dewilde, p
willert, k
hauert, c
nowak, mlieberman, e
sigmund, k
skvoretz, jwasserman, s
rowlee, d
faust, k
konig, p
engel, a
singer, w fries, p
gray, c
diesmann, m
mehring, c
palm, g
gerstein, g
kubo, m
hehl, u
habib, m
aertsen, a
wolf, y
esclapez, m
rzhetsky, a
benari, y
gozlan, h
levanquyen, m
quilichini, p
gomez, s
vanvreeswijk, c
golomb, d
sompolinsky, h
borgers, c
kopell, n
hansel, d
stauffer,d
Barabasi
sole, r
Boccalettijeong,h
Kahng
NewmanLatora
Robert
Rinzel
3
Similar Authors in Aminer
4
Related Work and ChallengesMethod Time
ComplexitySpace Complexity
SimRank [kdd’02] O(IN2d2) O(N2)
TopSim [ICDE’12] O(NTdT) O(N+M)
RWR [KDD’04] O(IN2d) O(N2)
RoleSim [KDD’11] O(IN2d2) O(N2)
ReFex [KDD’11] O(N+I(fM+Nf2)) O(N+Mf)
Share many direct/indirect common neighbors.
Disconnected, but share similar structure.
1
2
v Find top-K similar vertices for any vertex in a networkv d: average degree, f: feature number, T: path length
C1 : How to design a similarity method that applies to both similarities?C2: Computational efficiency challenge.
Challenges
5
Our Approach: Panther
6
Path Similarity
• A path is a T-length sequence of vertices p = (v1,··· ,vT+1).
• Π is all the T-paths in G.• Path weight:
1
Intuition: two vertices are similar if they frequently appear on the same paths.
v1v3
v2
v4
v5
Sps(v1,v2)=0.37,Sps(v1,v3)=0.42,Sps(v1,v4)=0.39,Sps(v1,v5)=0.09.
(T=2)
7
Pantherps
v2
v3
v1
v0
v5
v4 v1
v0
v2
v3v2 v5
v1
v2
v3
v4
v1
v3
p1 p2 p3 p4 v0 p1v1 p1 p3
v3 p2v4 p4v5 p2
(a) Input network (b) Random paths (c) Vertex-to-path index
Random walks
p4
p3 p4
v2 p1 p2 p3
Simplified path similarity:
O(dT)O(RT)
Basic idea: random path sampling
8
Theoretical Analysis• How many random paths shall we sample?
1 2 3
Domain and range set
Upper bound of range set’s VC dimension Distribution
Required sample size
9
Theoretical Analysis• Domain: Π• Range set:• VC bound:• Distribution:
• Path similarity is• Conclusion
– R random paths can guarantee ε and 1−δ.
Details
10
Proof of
Assume
A set Q of size l can be shattered by RG
A 1-1 corresponding between each subset in Q and each range Pi in RG
A path belongs only to the ranges w.r.t a pair of vertices in the path
Contradiction
and
Details
11
Vector Similarity and Panthervs• Limitation of path similarity: bias to close neighbors.• Vector Similarity: the probability distributions of a vertex linking to all other vertices are similar if their topology structures are similar.
• Panthervs :Use top-D path similarities calculated by Pantherps to represent a vector:
0.39 0.12 0.12 0.12 0.12 0.12
0.13 0.13 0.04 0 0 0
2
0.25 0.12 0.12 0.11 0.02 0Svs(u,w)=0.27 > Svs (u,v)=0.16
(T=2)
u
0.12 0.12
0.120.12
0.12
0.39
v0.13 0.04 0 0 00.13
w
0.11 0.02 0
0.12
0.12
0.25
12
Time ComplexityMethod Time
ComplexitySpace Complexity
SimRank O(IN2d2) O(N2)
TopSim O(NTdT) O(N+M)
RWR O(IN2d) O(N2)
RoleSim O(IN2d2) O(N2)
ReFex O(N+I(fM+Nf2)) O(N+Mf)
Pantherps O(RTc+NdT) O(RT+Nd)
Panthervs O(RTc+NdT+Nc) O(RT+Nd+ND)
Random path sampling
Top-k similarity search for any
vertex
Build and query kd-tree
Random path
Vertex-to-path index
Kd-tree
13
Experiments
14
Evaluation Aspects• Efficiency Performance• Accuracy Performance• Parameter Sensitivity Analysis
15
Efficiency PerformancePreprocessing time + top-k similarity search time
270X speed upCan scale up to handle 1 billion edges
|V| |E| RWR[(KDD’04]
TopSim[ICDE’12]
RoleSim[KDD’11]
ReFex[KDD’11]
Pantherps Panthervs
6,523 10,000 +7.79hr +38.58m +37.26s 3.85s+0.07s 0.07s+0.26s 0.99s+0.21s
25,844 50,000 +>150hr +11.20hr +12.98m 26.09s+0.40s 0.28s+1.53s 2.45s+4.21s
48,837 100,000 +30.94hr +1.06hr 2.02m+0.57s 0.58s+3.48s 5.30s+5.96s
169,209 500,000 +>120hr +>72hr 17.18m+2.51s 8.19s+16.08s 27.94s+24.17s
230,103 1,000,000 31.50m+3.29s 15.31s+30.63s 49.83s+22.86s
443,070 5,000,000 24.15hr+8.55s 50.91s+2.82m 4.01m+1.29m
702,049 10,000,000 >48hr 2.21m+6.24m 8.60m+6.58m
2,767,344 50,000,000 15.787m+1.36hr 1.60hr+2.17hr
5,355,507 100,000,000 44.09m+4.50hr 5.61hr+6.47hr
26,033,969 500,000,000 4.82hr+25.01hr 32.90hr+47.34hr
51,640,620 1,000,000,000 13.32hr+80.38hr 98.15hr+120.01hr
390X speed up
v T=5, c=0.5, ε=√1/|E| and δ=0.1, R=16,609,640
Tencentnetwork
16
Accuracy Performance of Pantherps• Evaluate how Pantherps can approximate common neighbors.• The score represents the improvement over a random method.
KDD Twitter Mobile
v Co-author networks: |V|=3K, |E| = 7K.v Twitter network: |V| = 100K, |E| = 500K.v Mobile network: |V| = 200K, |E| = 200K.
17
Accuracy Performance of Panthervs• Identity Resolution
– Assume the same authors in different networks of the same domain are similar to each other.
• Settings– Given any two co-author networks, e.g., KDD and ICDM, if the top-k similar vertices from ICDM consists of the query author from KDD, we say that the method hits a correct instance.
KDD-ICDM SIGMOD-ICDESIGIR-CIKM
18
Parameter Analysis: Path Length T
• The performance gets better when T increases. • The performance almost becomes stable When T ≥ 5.
Effect of path length T on the accuracy performance of Pantherps.
KDD Mobile Twitter
19
Parameter Analysis: Error Bound ε
• When |E|/(1/ε)2 ranges from 5 to 20, scores of Pantherpsare almost convergent.
• The value (1/ε)2 is almost linearly positively correlated with the number of edges in a network.
v Tencent sub networksv Pantherps
20
Conclusion• Methods:
– Solve two similarity metrics efficiently.
• Theoretic analysis:– Sampling size is only related to path length given error-bound and confidence level.
• Empirical evaluations:– When |V| = 0.5 million and |E|=5 million, Pantherps achieves a 390× speed-up and Panthervs achieves a 270x speed-up.
– Panther can scale up to a network with 1 billion edges.
21
Thank You
Code & Data:http://aminer.org/Panther