10
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010

Parallel Applications And Tools For Cloud Computing Environments

Embed Size (px)

DESCRIPTION

Parallel Applications And Tools For Cloud Computing Environments. CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010. Large Scale PageRank with Iterative MapReduce. Shuohuan,Yuduo,Parag,Hui. Outline. m otivation of large scale pagerank o ptimization s trategies - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Applications And Tools For Cloud Computing Environments

Parallel Applications And Tools For Cloud Computing Environments

CloudCom 2010Indianapolis, Indiana, USA

Nov 30 – Dec 3, 2010

Page 2: Parallel Applications And Tools For Cloud Computing Environments

Large Scale PageRank with Iterative MapReduce

Shuohuan,Yuduo,Parag,Hui

Page 3: Parallel Applications And Tools For Cloud Computing Environments

Outlinemotivation of large scale pagerankoptimization strategiesexperiments resultsvisualization with PlotViz3

Page 4: Parallel Applications And Tools For Cloud Computing Environments

PageRankLarge scale PageRank

Large graph processing become popularEfficient processing of large scale graph

challenges current MapReduce runtimes.Motivation: common optimization

strategies for large scale PageRankCurrent status

Twister, Hadoop,DryadLINQ with ClueWeb data set with 50 million pages

MPI PageRank

Page 5: Parallel Applications And Tools For Cloud Computing Environments

Optimization StrategiesCache partitions of web graph in Memory

Twister, Pregel, HaLoop, Surfer, Static Data (am files)

Partition the web graphDryadLINQ, (Twister, Hadoop) PageRankTask granularity should fit the memory and

network bandwidth in Cloud infrastructureHierarchy messaging in reduce stage

Hadoop, (Twister, DryadLINQ) PageRankLocal merge

Page 6: Parallel Applications And Tools For Cloud Computing Environments

Cache Static Data

500 1500 2500 3500 45000

1000

2000

3000

4000

5000

6000

7000

Twister Hadoop

Page 7: Parallel Applications And Tools For Cloud Computing Environments

Partition the WebGraphscalability with various nodes on Madrid

8 nodes 7 nodes 6 nodes 5 nodes 4 nodes 3 nodes0

2000

4000

6000

8000

10000

12000

14000

16000

420 Files ChunksLinear (420 Files Chunks)Linear (420 Files Chunks)Linear (420 Files Chunks)Single File Per NodeLinear (Single File Per Node)

Page 8: Parallel Applications And Tools For Cloud Computing Environments

Partition the web graphscalability with various input data size on Tempest

160 files 320 files 640 files 960 files 1280 files0

1000

2000

3000

4000

5000

6000

7000

fine granularityLinear (fine granularity)Linear (fine granularity)Linear (fine granularity)coarse granularityLinear (coarse granularity)

Page 9: Parallel Applications And Tools For Cloud Computing Environments

Hierarchy Messaging in Reduce Stage

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 320

2

4

6

8

10

12

14

16

18

original msg size msg size after local merge

Page 10: Parallel Applications And Tools For Cloud Computing Environments

Visualization with PlotViz31k vertices, red vertex: wikipedia.org