DRX Final Presentation Slides

Embed Size (px)

Citation preview

  • 7/30/2019 DRX Final Presentation Slides

    1/34

    Performance Analysis

    ofHadoop Link Prediction

  • 7/30/2019 DRX Final Presentation Slides

    2/34

  • 7/30/2019 DRX Final Presentation Slides

    3/34

  • 7/30/2019 DRX Final Presentation Slides

    4/34

    Facebook

    ??

  • 7/30/2019 DRX Final Presentation Slides

    5/34

    Twitter

    ?

    ? X

  • 7/30/2019 DRX Final Presentation Slides

    6/34

    Problem Statement

    In a networkG=(V,E,X), for a particular uservsand a set of candidates C to which vsmay

    create a link, find a predictive function

    f:(V,E,X,vs,C)Y

    where Y={y1,y2,...,y|C|} is a set of inferredresults for whether uservswould create links

    with users in C.

  • 7/30/2019 DRX Final Presentation Slides

    7/34

    Challenges

    Real networks are large >1 billion users on Facebook (Oct. 2012) >500 million users on Twitter (Jul. 2012)

    > 175 million users on LinkedIn (Jun. 2012) Big data makes prediction even slower

  • 7/30/2019 DRX Final Presentation Slides

    8/34

    Our Solution

    Divide Adjacency list

    Distributed computing

    Hadoop

    Smaller Problems

    Map Reduce

    Data Intensive Scie

  • 7/30/2019 DRX Final Presentation Slides

    9/34

    split 0 map

    sort

    split 1 map

    sort

    split 2 map

    sort

    reduce

    merge

    reduce

    merge

  • 7/30/2019 DRX Final Presentation Slides

    10/34

    Link Prediction Framewo

    Prepare Vertex Num Split DataProbe Edge

    NumD

    LP ScoreProbe ScoreNon-Exist

    ScoreAUC

  • 7/30/2019 DRX Final Presentation Slides

    11/34

    Algorithm Design

    1 2

    3

    5

    6

    74

    1 25 61 31 42 32 43 42 64 56 7

    5 65 7

    2 32 42 6 1 2,3,4

    2 3,4,63 4,,,,4 5,,,,

    5 6,7,,6 7,,,,

    2,2,3,

    3 4

    1 21 31 4

    4 5

    4 5

    3

    34

    6

    Mapper Reducer Mapper

  • 7/30/2019 DRX Final Presentation Slides

    12/34

    Data Sets

    Name Nodes Edges Relative

    HepPh 12,008 237,010 1x

    ND Web 325,729 1,497,134 7.14x

    Live Journal 4,847,571 68,993,773 357.78

  • 7/30/2019 DRX Final Presentation Slides

    13/34

  • 7/30/2019 DRX Final Presentation Slides

    14/34

  • 7/30/2019 DRX Final Presentation Slides

    15/34

  • 7/30/2019 DRX Final Presentation Slides

    16/34

  • 7/30/2019 DRX Final Presentation Slides

    17/34

  • 7/30/2019 DRX Final Presentation Slides

    18/34

    Time Breakdown

    Which step(s)?

  • 7/30/2019 DRX Final Presentation Slides

    19/34

    80

    60

    40

    20

    0

    Time(%o

    ftot

    al)

    ND Web LiveHEP Ph

  • 7/30/2019 DRX Final Presentation Slides

    20/34

  • 7/30/2019 DRX Final Presentation Slides

    21/34

    Machine Specification

    26 Nodes 32 GB RAM 12x2 TB SATA disks (4 dedicated to Hadoop stor 2x8-core Intel Xeon E5620 CPUs @ 2.40 GHz Gigabit Ethernet

  • 7/30/2019 DRX Final Presentation Slides

    22/34

    Monitoring Tools

    Resource Command

    CPU iostat -c1

    Disk iostat -d1

    Network netstat -c -I

  • 7/30/2019 DRX Final Presentation Slides

    23/34

  • 7/30/2019 DRX Final Presentation Slides

    24/34

  • 7/30/2019 DRX Final Presentation Slides

    25/34

  • 7/30/2019 DRX Final Presentation Slides

    26/34

    Disk

  • 7/30/2019 DRX Final Presentation Slides

    27/34

    0 1000 2000 3000 4000 5000 60

    BlocksRead(1kb

    locks)

    0

    40

    80

    Time (s)

    LP Score AUC

  • 7/30/2019 DRX Final Presentation Slides

    28/34

  • 7/30/2019 DRX Final Presentation Slides

    29/34

    Network

  • 7/30/2019 DRX Final Presentation Slides

    30/34

    0 1000 2000 3000 4000 5000 60

    Dat

    aReceived(Mb/s)

    0

    500

    1000

    Time (s)

    LP Score AUC

  • 7/30/2019 DRX Final Presentation Slides

    31/34

  • 7/30/2019 DRX Final Presentation Slides

    32/34

    n = 130000001

  • 7/30/2019 DRX Final Presentation Slides

    33/34

    n 13000000double left[] = newdouble[n];double right[] = newdouble[n];int n1=0, n2=0;int m = 3*n;

    for(int i = 0; i < m; i++){" index1 = rand.nextInt(n);" index2 = rand1.nextInt(n);

    " leftScore = left[index1];" rightScore = right[index2];

    if(leftScore > rightScore){

    n1++;" } else if( Math.abs(leftScore - rightScore) < 1E-6 ){" n2++;

    }}

    AUC = ( n1 + 0.5 * n2 ) / m;

    1234567

    89101112131415

    1617181920212223

  • 7/30/2019 DRX Final Presentation Slides

    34/34