41
On The [Ir]relevance of Network Performance for Data Processing Animesh Trivedi , Patrick Stuedi, Jonas Pfefferle, Radu Stoica, Bernard Metzler, Ioannis Koltsidas, Nikolas Ioannou IBM Research, Zurich

On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

On The [Ir]relevance of Network Performance for Data Processing

Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle,Radu Stoica, Bernard Metzler, Ioannis Koltsidas,

Nikolas Ioannou

IBM Research, Zurich

Page 2: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 2

How [Ir]relevant is the Network?

Page 3: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 3

How [Ir]relevant is the Network?

Page 4: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 4

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

Page 5: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 5

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

Page 6: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 6

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Network IO is very relevant - up to 64%

Ru

nti

me

in s

ecs

60%

64%

47% 33%

28%

1

Page 7: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 7

How [Ir]relevant is the Network?

TeraSort PageRank SQL WordCount GroupBy0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Network IO is very relevant - up to 64% ??

Ru

nti

me

in s

ecs

1

Page 8: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 8

Is It Spark Specific?

Flink-TS Flink-PR GraphLab Timely0

50

100

150

200

250

3001 Gbps 10 Gbps 40 Gbps

Ru

nti

me

in s

ecs

725s

Page 9: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 9

Spark TeraSort: The Shuffle Story

outputinput

distributed sorting

- simple - shuffle data is input data- highest chance of improvements

Page 10: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 10

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

Map tasks

Cores

input

Page 11: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 11

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net

net

net

reading in shuffle data

Cores

input

Map tasks

Page 12: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 12

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net CPU

net CPU

net CPU

reading in shuffle data

sortingshuffle data

Cores

input

Map tasks

Page 13: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 13

Spark TeraSort: The Shuffle Story

Shuffledata

Reduce tasks

output

net CPU

net CPU

net CPU

reading in shuffle data

sortingshuffle data

performance

Cores

input

Map tasks

Page 14: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 14

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

Page 15: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 15

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

Page 16: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 16

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92%

Page 17: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 17

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92% 97%

Page 18: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 18

How Important is the Network?

1 Gbps 10 Gbps 40 Gbps 100 Gbps*0%

20%

40%

60%

80%

100%

CPUNetwork

Gains from the networks are shadowed by the high CPU footprint

52%

48%

8%

92% 97% 99%

Network gains are shadowed by the CPU

Page 19: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 19

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Spark

Page 20: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 20

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Spark

Page 21: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 21

What Exactly is the CPU Doing?

Map Reduce0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux

Overheads are spread across the entire stack - serialization, abstration, execution model etc.2

Spark

Page 22: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 22

The Balancing Act: CPU vs Network

Page 23: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 23

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Sorting : O(nlog(n))Network: O(n)

use smaller 'n'

Page 24: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 24

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 25: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 25

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 26: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 26

The Balancing Act: CPU vs Network

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 27: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 27

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

if a single corecannot do 40 Gbps

then use more

Needs a more careful analysis of at the entire stack

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 28: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 28

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 29: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 29

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 30: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 30

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

1 2 4 8 160

20

40

60

Number of cores

idealmeasured

Ban

dw

idth

(Gb

ps)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Page 31: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 31

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Number of coresR

un

tim

e (s

ecs)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

1 2 4 8 160

100

200

300reduce map

Page 32: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 32

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Number of coresR

un

tim

e (s

ecs)

I.Balance out the CPU

with the network time

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

1 2 4 8 160

100

200

300reduce map

260

_____coresruntime = 9 +

Page 33: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 33

The Balancing Act: CPU vs Network

II.Use more cores to

scale up

Classical techniques are ineffective

I.Balance out the CPU

with the network time

3

Smaller Partitions

Ru

nti

me

(se

cs)

020406080

100

Number of cores1 2 4 8 16

0

100

200

300reduce map

Ru

nti

me

(se

cs)

Page 34: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 34

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

Page 35: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 35

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

Page 36: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 36

ConclusionFaster networks (IO) are very relevant – as long as you have CPU cycles – differentiate between user vs framework CPU usage

Framework's CPU usage is bad – CPU-network imbalance : sorting, serialization, volcano

execution model, etc. – scalability (serial vs parallel components)– ineffective classical balancing techniques

Knowing today's usec-era IO and CPU hardware, how would you re-design modern data processing framework?

1

2

3

Page 37: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 37

Backup

Page 38: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 38

Spark

Page 39: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 39

Spark

Page 40: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

The 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16) 40

Runtime

1 2 4 8 160

50

100

150

200

250

300reduce map

Page 41: On The [Ir]relevance of Network Performance for Data ... › researcher › files › zurich-ATR › trivedi2.pdfOn The [Ir]relevance of Network Performance for Data Processing Animesh

What Exactly is the CPU Doing?Sp

ark

Map Reduce Reduce/Count0%

20%

40%

60%

80%

100%

Misc.IteratorSerializationSortingIOJVMLinux