21
1 Panther: Fast TopK Similarity Search on Large Networks Jing Zhang 1 , Jie Tang 1 , Cong Ma 1 , Hanghang Tong 2 , Yu Jing 1 , and Juanzi Li 1 1 Department of Computer Science and Technology Tsinghua University 2 School of Computing, Informatics, and Decision Systems Engineering Arizona State University

Panther:)Fast)Top/KSimilarity) Search)on)Large)Networks · d) O() NTdT) O(+M))) +(++) 1!

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Panther:  Fast  Top-K  Similarity  Search  on  Large  Networks

    Jing  Zhang1,  Jie Tang1,  Cong  Ma1,  Hanghang Tong2,  Yu  Jing1,  and  Juanzi Li1

    1Department  of  Computer  Science  and  TechnologyTsinghua  University

    2School  of  Computing,  Informatics,  and  Decision  Systems  EngineeringArizona  State  University

  • 2

    Who  are  Similar  with  Barabási?

    tangmunarunkit, h

    galas, d

    aiello, w

    govindan, r

    whittington, m

    lu, l

    ermentrout, b

    shenker, s

    bollt, e

    ruiz, m

    belew, rko, j

    kawai, r

    billings, l

    chern, j menczer, fotsuka, k

    stilwell, d

    hwong, sroberson, d

    pant, g srinivasan, p

    sherrington, d

    solomonoff, r

    rapoport, a

    svennenfors, b

    harkany, t handcock, mhoff, p

    raftery, a

    horvath, w

    jones, j

    campbell, s

    jayaprakash, ckenet, t

    terman, d

    wang, d

    chammah, a

    kesselman, c

    ripeanu, m

    iamnitchi, a

    holmgren, c

    zilberter, y

    foster, i

    birman, kbrown, j

    vogels, w

    erlebach, t

    vukadinovic, d

    vanrenesse, rhuang, l

    yang, k

    huang, penquist, b

    west, g

    stevens, c

    bollobas, b

    klyachko, v

    kegels, s jurkiewicz, j

    schikorski, t

    bchklovskii, d

    joy, m

    krzywicki, a

    jost, j

    schiff, s

    so, pzhu, h

    barreto, e

    zhu, j

    tusnady, gburda, z

    fullilove, m

    hughes, b coates, tcorreia, jreed, w

    riordan, o

    catania, jchan, d

    leong, a

    bekessy, p

    paturi, r

    bekessy, akomlos, j

    everett, myeates, tmarcotte, e

    rice, d

    salwinski, leisenberg, d

    pello, jromance, m

    criado, r

    hernandezbermejo, b

    garciadelamo, aflores, j

    hui, p

    acebron, j

    ritort, f zheng, b

    spigler, rtrimper, sbonilla, l

    zheng, d

    perezvicente, c

    guevara, m

    volchenkova, l

    volchenkov, d

    oosawa, c

    yehia, a

    chang, c

    blanchard, p

    alonso, f

    jeandupreux, d

    kruger, t

    lewis, j

    bray, dalberts, b

    watson, j

    raff, mroberts, k

    foster, p

    herrmann, j

    dodel, s

    borgatti, s

    weeks, m

    clair, s

    jamin, s

    poole, a

    carreras, b

    timme, m

    lynch, v

    wolf, f dolrou, i

    newman, d

    geisel, t

    arieli, a

    grinvald, a

    tsodyks, m

    rosajr, e schwartz, i

    soares, d

    mariz, a

    atay, f

    wende, a

    spencer, j

    dealbuquerque, m

    dasilva, l

    tsallis, c

    hiavacek, wsavageau, m

    wall, m

    rodriguez, e

    pardo, w

    monti, m

    varela, f

    lachaux, j

    ticos, c

    martinerie, j

    walkenstein, j

    gluckmann, b

    jankowski, s

    huang, z

    londei, a

    yang, l

    jia, l

    mazur, clai, p

    lozowski, a

    sano, m

    chan, c

    rivest, r

    gammaitoni, l

    jung, p

    corman, s

    mcphee, r

    goldberg, d

    oki, b amblard, f

    deffuant, g

    nettle, dkroger, h

    bork, phuynen, mdunbar, r

    snel, b ramezanpour, aroxin, a rosato, vsimard, dstiller, j

    hall, d nichols, d

    weisbuch, g

    neau, d

    marchesoni, fterry, d

    hanggi, p

    mashaghi, atiriticco, fbologna, s

    solla, skarimipour, vriecke, hnadeau, l

    leiserson, c

    stein, c

    kuhn, tcormen, t

    dooley, k

    vogelstein, b tyson, j

    lane, d csikasznage, alevine, a

    murray, a

    novak, b

    hopfield, j

    buhl, e

    pauls, j

    gardner, e

    logothetis, n

    trinath, t

    augath, m

    traub, r

    derrida, b

    oeltermann, a

    crisanti, a

    zippelius, a

    lourenc, g

    lima, gkree, r

    kinouchi, o

    flyvbjerg, h

    risaugusman, s

    berezovskaya, f

    pellegrinitoole, a

    martinez, a

    riley, m

    koonin, ekarev, g

    karp, p

    gelatt, c

    rulkov, n

    shinomoto, s tsimring, l

    kuramoto, ynakao, hkirkpatrick, s

    powell, w

    koput, k

    sushchik, m white, dvecchi, m abarbanel, h owensmith, jsakaguchi, h

    leibler, s bahar, s

    bienfang, jgauthier, d

    bourgine, p

    guelzim, n goodman, mharwell, l

    avery, l

    travers, j sato, t

    bottani, s

    kepes, f

    korte, c

    hall, g

    takayasu, h

    lockery, s

    takayasu, m

    milgram, s

    greene, d

    greve, h

    hauser, c

    larson, j

    paley, s

    demers, a

    krummenacker, m

    evans, m

    killworth, p

    mccarty, c

    bernard, hshelley, g

    lagofernandez, l

    konno, n

    miwa, h

    huerta, rzanette, d

    aihara, k

    masuda, nsiguenza, j

    corbacho, f

    yoo, m

    fell, d

    changizi, m

    stadler, pbaird, d

    hattori, m

    irish, w

    ulanowicz, rbernard, cgleiss, p

    abramson, g

    herzel, h

    patzak, amorelli, l

    holste, d

    mrowka, rpage, l

    kuperman, m

    winograd, t

    motwani, r

    brin, s

    cherniak, c

    faulkner, r

    rodriguezesteban, rmokhtarzada, zkang, d

    davis, g

    baker, w

    willinger, w chung, fvu, v

    dewey, t

    bhan, achen, q

    chang, hschensul, jradda, k

    hufnagel, l

    brockmann, d

    dobson, i sachtjen, m

    sokolov, i

    koopman, j

    jespersen, s

    xulvibrunet, r

    warren, c

    sander, l

    simon, c

    blumen, a

    oster, g

    hally, j

    leloup, j

    dupont, g

    gonze, d

    maltsev, n

    kaiser, d

    igoshin, o

    houart, g

    goldbeter, a

    crouch, bwhite, j

    clewley, r

    southgate, e

    keck, t

    pattison, pbrenner, s

    arno, s

    anderson, c

    netoff, tthompson, j

    adar, e

    puniyani, a

    lukose, rhuberman, b

    wilkinson, d

    tyler, j

    wu, f

    adamic, l

    li, ymansfield, t

    lockshon, d

    uetz, p

    glot, lnarayan, v

    pochart, p

    conover, dkammen, d

    koch, c

    crick, f

    niebur, e

    ress, g

    schuster, h

    laurent, g

    kreiman, g

    fabiny, l

    roy, r

    thornburg, k

    moller, m

    meester, r

    vanwiggeren, g

    rogister, f

    yang, mvijayadamodar, g

    cagney, g

    knight, j

    giot, l

    kalbfleisch, t

    godwin, b

    qureshiemili, asrinivasan, m

    fields, s

    rosa, e

    hunt, b

    restrepo, j

    deshazer, d

    hess, m

    ott, e

    breban, r

    rohlf, t

    reichardt, j

    davidsen, j

    bornholdt, s

    ebel, h

    mielsch, l

    hu, g

    yang, j

    liu, w

    zheng, z

    yao, yhu, b

    gao, z

    haythornthwaite, c

    johnston, m

    dimitrova, d

    wellman, b

    judson, r

    salaff, j

    garton, l

    gulia, m

    rothberg, j

    fink, k

    heagy, j barahona, m

    carroll, tjohnson, g

    pecora, l

    valladares, d

    allaria, e

    digarbo, a

    zhou, cmeucci, r

    arecchi, f

    chavez, m

    mendoza, chentschel, h

    pelaez, avallone, a

    vannucchi, f

    bragard, jmancini, h

    chate, h

    freund, h

    gregoire, g

    tass, p

    weule, m

    schnitzler, a

    rudzick, o

    pikovsky, a

    nishikawa, t

    ye, n

    demoura, a

    motter, a

    liu, z

    hoppensteadt, flai, y

    grebogi, c

    dasgupta, p

    lounasmaa, o

    salmelin, r

    hari, r

    kujala, j

    gross, j

    ilmoniemi, r

    knuutila, j

    timmermann, l

    hamalainen, m

    rosenblum, m

    schafer, c

    zaks, m

    osipov, g

    park, e

    volkmann, j

    abel, h

    kurths, j

    maza, d

    vegaredondo, f

    guardiola, x

    moreno, y

    louis, e

    perez, c

    diazguilera, a

    vragovic, i

    llas, m

    gomezgardenes, j boguna, mrubi, mechenique, pnekovee, m

    lawrence, ssoffer, s

    flammini, a

    giles, c

    glover, e

    leone, m

    flake, g

    pennock, dzecchina, r

    broder, a

    kumar, r

    vilone, d

    wiener, j

    dorogovtsev, s

    radicchi, f

    cecconi, f

    parisi, d

    samukhin, a

    castellano, c

    loreto, v

    goltsev, a

    pacheco, a

    hwang, d

    gomez, j

    amann, a

    lopezruiz, r

    vazquezprada, m

    floria, l

    cieplak, m

    holter, n

    mitra, m

    rigon, r

    banavar, j

    rinaldo, a

    rodrigueziturbe, i

    fedroff, n

    maritan, a

    giacometti, a

    weigt, mmaghoul, f

    upfal, e

    vespignani, acoetzee, f

    vazquez, astata, r

    moukarzel, c

    song, c

    korniss, g

    kozma, b

    penna, t

    toroczkai, z

    danon, lguichard, e

    barthelemy, m

    arenas, ascala, a

    moreira, a

    amaral, l

    camacho, j

    gleiser, p

    turtschi, a

    giralt, f

    provero, p

    gondran, bguimera, r

    mossa, s

    cabrales, a

    herrmann, c

    rajagopalan, s

    sivakumar, d

    kumar, s

    kepler, t

    pastorsatorras, r

    tomkins, a

    raghavan, p

    ramasco, j

    barrat, a

    kohler, r

    mendes, j

    janssen, c montoya, j

    bassler, k

    corral, a

    hengartner, n

    paczuski, m

    baiesi, m

    bonanno, g

    kleinberg, j

    mirollo, r

    matthews, p

    smith, e

    buhl, j

    valverde, s

    theraulaz, g

    defraysseix, h

    garciafernandez, j

    ferrericancho, r

    gautrais, j

    deneubourg, j

    cancho, r

    kuntz, p

    makse, h

    frauenfelder, h

    vazquez, f

    stroud, d

    leyvraz, frozenfeld, a

    bennaim, e

    erez, k

    antal, t

    cohen, rhavlin, s

    krapivsky, p

    benavraham, dredner, s

    dezso, z

    martinez, n

    kim, j

    schwartz, n

    berlow, e

    demenezes, m

    dobrin, r

    williams, r

    somera, a

    mongru, d

    dunne, j

    park, y

    goh, k

    lee, d

    jung, s

    kim, s

    ghim, c

    oh, e

    yook, s

    podani, j

    rho, k

    kim, d

    tu, y

    yoon, c

    kim, b

    huss, m

    han, s

    chung, j

    holme, p

    hong, h

    moore, c

    girvan, m

    loffredo, m

    martin, m

    schrag, s

    sanwalani, v

    mucha, p

    salazarciudad, i

    yeung, m

    lusseau, dstrogatz, s

    muhamad, r

    hopcroft, j

    gastner, m

    watts, d

    park, j

    callaway, d

    coccetti, f

    servedio, v

    castri, m

    mantegna, rcaldarelli, g

    lillo, fghoshal, g

    capocci, a

    pietronero, l

    battiston, s

    garlaschelli, d

    petermannn, t

    catanzaro, m

    delosrios, p

    hong, d

    leicht, e

    edling, c

    colaiori, f

    aberg, yliljeros, f

    stanley, handrade, j

    porter, m

    sabel, c

    clauset, a

    rothman, d

    http://web1.aminer.org/

    dodds, p

    breiger, rarabie, p

    bonney, m

    trotter, r

    boorman, s

    darrow, wzimmerman, h

    maldonadolong, tmuth, j

    baldwin, jphillipsplummer, l

    woodhouse, d

    muth, spotterat, j

    klovdahl, a

    overbeek, r

    selkov, e

    pusch, g kyrpides, n

    selkovjr, e

    dsouza, m

    larsen, n

    fonstein, m

    baron, mxenarios, i

    chatterjee, a

    sreeram, p

    dasgupta, s

    sen, p

    mukherjee, g

    chakrabarti, b

    manna, s

    biswas, t

    banerjee, k

    nazer, n

    white, h

    lorrain, f

    taylor, jgreen, d

    rothenberg, rzimmermanroger, h

    leiber, srosenberg, r

    mangan, s

    bashkin, p

    alon, u

    itzkovitz, s

    song, s

    koulakov, a

    nelson, s

    sjostrom, p

    svoboda, k

    chklovskii, d

    reigl, m

    mel, b

    young, m

    haga, p

    payne, b

    sager, j

    falchier, a

    baddeley, r

    grant, s

    vezoli, j

    csardi, g

    scannell, j

    knoblauch, k

    imbert, m

    sthepan, k

    rosentiehl, p

    kotter, r

    zwi, jjouve, b

    sporns, o

    passingham, r

    sommer, fkennedy, h

    martin, r

    grant, a

    blackmore, c

    baliki, m

    apkarian, achialvo, d

    oneill, m

    kaiser, m burns, g

    kamper, lhilgetag, c

    bozkurt, a

    stephan, k

    andras, p

    zimmermann, m

    sanmiguel, m

    amengual, a

    montagne, r

    klemm, k

    hernandezgarcia, e

    suchecki, k

    eguiluz, v

    cecchi, g

    sigman, m

    tsalyuk, m

    mayo, azaslaver, a

    sberro, h

    surette, m

    ofersarig, y

    bergmann, s

    barkai, n

    ihmels, j

    friedlander, g

    shenorr, smilo, r

    levitt, r

    sheffer, m

    kashtan, n

    ziv, g

    ayzenshtat, i

    greenbaum, d

    greenblatt, j

    krogan, n

    snyder, m

    jansen, r

    yu, h

    gerstein, m

    emili, a

    chung, s

    kluger, y

    mannhaupt, g

    rudd, syu, x

    tornow, s

    chen, d

    weil, bguldener, u

    mewes, hmokrejs, m

    munsterkotter, m

    mayer, kmorgenstern, bli, x

    li, c frishman, d

    buzsaki, g

    lu, jwang, x

    henze, d

    xu, jchallet, d

    chen, g

    geisler, c

    chrobak, j

    zhang, y

    zhan, m

    braun, t

    cerdeira, h

    chen, s

    lee, tyoo, j

    rinaldi, n

    gilles, e

    young, r

    gerber, g

    klamt, sgordon, d

    barjoseph, z

    schuster, s

    koch, i

    dandekar, t

    pfeiffer, t

    moldenhauer, f

    bettenbrock, k stelling, jfraenkel, ejaakkola, t

    moss, fhuber, m

    braun, h

    wojtenek, w

    pei, x

    voigt, k

    wilkens, l

    neiman, a

    franceschi, c

    marchiori, m

    valensin, s

    castellani, g

    remondini, d

    tieri, p

    farkas, i

    ravasz, e

    oltvai, z

    bianconi, g

    schubert, a

    mason, stombor, b

    szathmary, e

    neda, z

    park, h

    derenyi, i tadic, b

    albert, r

    albert, i

    kinney, r

    caruso, f

    rapisarda, a

    nakarado, gporta, s

    tononi, g

    edelman, g

    mcintosh, a

    russell, d

    segev, r

    darbydowman, k

    ergun, g

    czirok, a

    thurner, s

    ayali, a

    shefi, o

    benjacob, e

    golding, i

    cohen, i

    wuchty, s

    rodgers, gvicsek, t

    beg, q

    dovidio, f

    marodi, m

    macdonald, p

    shochet, o

    stagni, c

    usai, l

    pluchino, a

    cosenza, s

    fortuna, l

    larosa, m

    bucolo, m

    crucitti, pspata, a

    frasca, m

    gorman, s

    kulkarni, r

    almaas, e

    kovacs, b

    roux, s

    muren, l

    dearcangelis, l

    lingjiang, k

    gonzales, m

    sousa, a

    yusong, t

    fortunato, s

    montuori, m

    garrido, p

    torres, j

    eriksen, k

    sneppen, k

    zaliznyak, a

    bak, p

    simonsen, i

    maslov, s

    donetti, l

    marro, j

    costa, u

    dafontouracosta, l

    dickman, r

    araujo, a

    adler, j

    bernardes, a

    aharony, a

    aleksiejuk, a

    meyerortmanns, h

    warmbrand, c

    forrest, s

    jin, e

    balthrop, j

    kalapala, v

    ancelmeyers, l

    fronczak, p

    diambra, l

    holyst, j

    jedynak, m

    fronczak, a

    sienkiewicz, j

    jarisaramaki, j

    onnela, j

    kertesz, j

    chakraborti, a

    szabo, gkaski, k

    alava, m

    lahtinen, j

    kanto, a

    trusina, a

    delucia, m

    bottaccio, m

    choi, m

    minnhagen, pherrmann, h

    rosvall, m

    munoz, m

    gupta, slloyd, a

    may, r anderson, r

    fried, i

    moll, c

    ojemann, g

    buchel, c

    berg, j

    friston, k

    wagner, aliddle, p

    coull, j

    frith, classig, m

    frackowiak, r

    deaguilar, s

    lucena, l

    delimaesilva, d

    schivanialves, m

    corso, g

    henriques, m

    medeirossoares, m

    decarvalho, t

    sakaki, y

    yoshida, m

    ozawa, rtaylor, w

    krause, a

    mason, dchiba, t

    frank, kito, t

    cordes, d

    haughton, vturski, p

    carew, j

    quigley, m

    meyerand, marfanakis, k

    moritz, c

    fu, z

    wang, b

    yan, g

    zhou, t

    wang, j

    zhang, f

    dewilde, p

    willert, k

    hauert, c

    nowak, mlieberman, e

    sigmund, k

    skvoretz, jwasserman, s

    rowlee, d

    faust, k

    konig, p

    engel, a

    singer, w fries, p

    gray, c

    diesmann, m

    mehring, c

    palm, g

    gerstein, g

    kubo, m

    hehl, u

    habib, m

    aertsen, a

    wolf, y

    esclapez, m

    rzhetsky, a

    benari, y

    gozlan, h

    levanquyen, m

    quilichini, p

    gomez, s

    vanvreeswijk, c

    golomb, d

    sompolinsky, h

    borgers, c

    kopell, n

    hansel, d

    stauffer,d

    Barabasi

    sole, r

    Boccalettijeong,h

    Kahng

    NewmanLatora

    Robert

    Rinzel

  • 3

    Similar  Authors  in  Aminer

  • 4

    Related  Work  and  ChallengesMethod Time  

    ComplexitySpace  Complexity

    SimRank [kdd’02] O(IN2d2) O(N2)

    TopSim [ICDE’12] O(NTdT) O(N+M)

    RWR  [KDD’04] O(IN2d) O(N2)

    RoleSim [KDD’11] O(IN2d2) O(N2)

    ReFex [KDD’11] O(N+I(fM+Nf2)) O(N+Mf)

    Share  many  direct/indirect  common  neighbors.

    Disconnected,  but  share  similar  structure.

    1

    2

    v Find  top-K  similar  vertices  for  any  vertex  in  a  networkv d:  average  degree,  f:  feature  number,  T:  path  length

    C1  :  How  to  design  a  similarity  method  that  applies  to  both  similarities?C2:   Computational  efficiency  challenge.

    Challenges

  • 5

    Our  Approach:  Panther

  • 6

    Path  Similarity

    • A  path  is  a  T-length  sequence  of  vertices  p =  (v1,···  ,vT+1).

    • Π is  all  the  T-paths  in  G.• Path  weight:

    1

    Intuition:  two  vertices  are  similar   if  they  frequently  appear  on  the  same  paths.

    v1v3

    v2

    v4

    v5

    Sps(v1,v2)=0.37,Sps(v1,v3)=0.42,Sps(v1,v4)=0.39,Sps(v1,v5)=0.09.

    (T=2)

  • 7

    Pantherps

    v2

    v3

    v1

    v0

    v5

    v4 v1

    v0

    v2

    v3v2 v5

    v1

    v2

    v3

    v4

    v1

    v3

    p1 p2 p3 p4 v0 p1v1 p1 p3

    v3 p2v4 p4v5 p2

    (a) Input network (b) Random paths (c) Vertex-to-path index

    Random walks

    p4

    p3 p4

    v2 p1 p2 p3

    Simplified  path  similarity:

    O(dT)O(RT)

    Basic  idea:  random  path  sampling

  • 8

    Theoretical  Analysis• How  many  random  paths  shall  we  sample?

    1 2 3

    Domain  and  range  set

    Upper  bound  of  range  set’s  VC  dimension Distribution

    Required  sample  size

  • 9

    Theoretical  Analysis• Domain:  Π• Range  set:• VC  bound:• Distribution:

    • Path  similarity                                          is• Conclusion

    – R random  paths  can  guarantee  ε and 1−δ.

    Details

  • 10

    Proof  of                                                                                                                                                                  

    Assume

    A  set    Q of  size  l   can  be  shattered  by  RG  

    A  1-1  corresponding  between  each  subset  in  Q and  each  range  Pi in  RG

    A  path  belongs  only  to  the  ranges  w.r.t a  pair  of  vertices  in  the  path

    Contradiction

    and

    Details

  • 11

    Vector  Similarity  and  Panthervs• Limitation  of  path  similarity:  bias  to  close  neighbors.• Vector  Similarity:  the  probability  distributions  of  a  vertex  linking  to  all  other  vertices  are  similar  if  their  topology  structures  are  similar.

    • Panthervs :Use  top-D path  similarities  calculated  by  Pantherps to  represent  a  vector:

    0.39 0.12 0.12 0.12 0.12 0.12

    0.13 0.13 0.04 0 0 0

    2

    0.25 0.12 0.12 0.11 0.02 0Svs(u,w)=0.27 > Svs (u,v)=0.16

    (T=2)

    u

    0.12 0.12

    0.120.12

    0.12

    0.39

    v0.13 0.04 0 0 00.13

    w

    0.11 0.02 0

    0.12

    0.12

    0.25

  • 12

    Time  ComplexityMethod Time  

    ComplexitySpace  Complexity

    SimRank O(IN2d2) O(N2)

    TopSim O(NTdT) O(N+M)

    RWR O(IN2d) O(N2)

    RoleSim O(IN2d2) O(N2)

    ReFex O(N+I(fM+Nf2)) O(N+Mf)

    Pantherps O(RTc+NdT) O(RT+Nd)

    Panthervs O(RTc+NdT+Nc) O(RT+Nd+ND)

    Random  path  sampling

    Top-k  similarity  search  for  any  

    vertex

    Build  and  query  kd-tree

    Random  path

    Vertex-to-path  index

    Kd-tree

  • 13

    Experiments

  • 14

    Evaluation  Aspects• Efficiency  Performance• Accuracy  Performance• Parameter  Sensitivity  Analysis

  • 15

    Efficiency  PerformancePreprocessing  time  +  top-k similarity  search  time

    270X  speed  upCan  scale  up  to  handle  1  billion  edges

    |V| |E| RWR[(KDD’04]

    TopSim[ICDE’12]

    RoleSim[KDD’11]

    ReFex[KDD’11]

    Pantherps Panthervs

    6,523 10,000 +7.79hr +38.58m +37.26s 3.85s+0.07s 0.07s+0.26s 0.99s+0.21s

    25,844 50,000 +>150hr +11.20hr +12.98m 26.09s+0.40s 0.28s+1.53s 2.45s+4.21s

    48,837 100,000 +30.94hr +1.06hr 2.02m+0.57s 0.58s+3.48s 5.30s+5.96s

    169,209 500,000 +>120hr +>72hr 17.18m+2.51s 8.19s+16.08s 27.94s+24.17s

    230,103 1,000,000 31.50m+3.29s 15.31s+30.63s 49.83s+22.86s

    443,070 5,000,000 24.15hr+8.55s 50.91s+2.82m 4.01m+1.29m

    702,049 10,000,000 >48hr 2.21m+6.24m 8.60m+6.58m

    2,767,344 50,000,000 15.787m+1.36hr 1.60hr+2.17hr

    5,355,507 100,000,000 44.09m+4.50hr 5.61hr+6.47hr

    26,033,969 500,000,000 4.82hr+25.01hr 32.90hr+47.34hr

    51,640,620 1,000,000,000 13.32hr+80.38hr 98.15hr+120.01hr

    390X  speed  up

    v T=5,  c=0.5,  ε=√1/|E| and δ=0.1, R=16,609,640

    Tencentnetwork

  • 16

    Accuracy  Performance  of  Pantherps• Evaluate  how  Pantherps can  approximate  common  neighbors.• The  score  represents  the  improvement  over  a  random  method.

    KDD Twitter Mobile

    v Co-author networks: |V|=3K, |E| = 7K.v Twitter network: |V| = 100K, |E| = 500K.v Mobile network: |V| = 200K, |E| = 200K.

  • 17

    Accuracy  Performance  of  Panthervs• Identity  Resolution

    – Assume  the  same  authors  in  different  networks  of  the  same  domain  are  similar  to  each  other.

    • Settings– Given  any  two  co-author  networks,  e.g.,  KDD  and  ICDM,  if  the  top-k similar  vertices  from  ICDM  consists  of  the  query  author  from  KDD,  we  say  that  the  method  hits  a  correct  instance.

    KDD-ICDM SIGMOD-ICDESIGIR-CIKM

  • 18

    Parameter  Analysis:  Path  Length  T

    • The  performance  gets  better  when  T increases.  • The  performance  almost  becomes  stable  When  T ≥  5.

    Effect  of  path  length  T on  the  accuracy  performance  of  Pantherps.  

    KDD Mobile Twitter

  • 19

    Parameter  Analysis:  Error  Bound  ε

    • When  |E|/(1/ε)2 ranges  from  5  to  20,  scores  of  Pantherpsare  almost  convergent.

    • The  value  (1/ε)2 is  almost  linearly  positively  correlated  with  the  number  of  edges  in  a  network.  

    v Tencent sub  networksv Pantherps

  • 20

    Conclusion• Methods:  

    – Solve  two  similarity  metrics  efficiently.

    • Theoretic  analysis:– Sampling  size  is  only  related  to  path  length  given  error-bound  and  confidence  level.

    • Empirical  evaluations:– When  |V|  =  0.5  million  and  |E|=5  million,  Pantherps achieves  a  390× speed-up  and  Panthervs achieves  a  270x  speed-up.

    – Panther can  scale  up  to  a  network  with  1  billion  edges.

  • 21

    Thank  You

    Code  &  Data:http://aminer.org/Panther