50
Design and Evaluation of the Hamal Parallel Computer J.P. Grossman Project Aries Supervisor: Tom Knight Dec. 3, 2002

Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

Design and Evaluation of the Hamal Parallel Computer

J.P. Grossman

Project AriesSupervisor: Tom Knight

Dec. 3, 2002

Page 2: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 2

�Motivation

� Million node general-purpose machine

� Scalable memory system

� Support for massive multithreading

� Discarding Network

� Billion transistor era

� Embedded DRAM

Page 3: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 3

�Talk Outline

� Overview of Hamal

� The Hamal memory system

� Thread management and synchronization

� Fault-tolerant messaging protocol

Page 4: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 4

�Hamal - Overview

� Distributed shared memory machine

� Multiple processor-memory nodes per die

� Fat tree interconnect

� Split-phase memory operations

� Memory consistency implemented in software

� No data caches

� Complete system cycle-accurate simulator

Page 5: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 5

�Hamal - Overview

Data Data Data Data

Controller/Arbiter

CodeNet Processor

Page 6: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 6

�The Hamal Processor

� 128-bit VLIW multithreaded (8 contexts)

� No register renaming, branch prediction, speculative execution, etc.

� Event-driven microkernel runs in context 0

Handle event

Pollqueue

Event Queue

Memory Events

Processor Events

Network Events

Context 0

Page 7: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 7

�Talk Outline

�Overview of Hamal

� The Hamal memory system

� Threads and synchronization

� Fault-tolerant messaging protocol

Page 8: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 8

�Capabilities

� 128 bit pointers with embedded hardware-enforced permissions and bounds� 64 address bits, 64 capability bits

� Single virtual address space� Reduces state associated with a process

� Easy sharing of data

� Intra-process protection

� Object-based protection

� Simple lazy page allocation

Page 9: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 9

�A Haiku

Capabilities!

It is no longer a sin

to program in C

Page 10: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 10

�Virtual Memory

virtual address physical address

node page offset

virtual address

physical page virtual page physical page

node page offset

virtual address

TLB Hardware Page Tables

Page 11: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 11

�Distributed Objects

� Hamal implements Sparsely Faceted Arrays[Brown02]

� Conceptually allocate same segment on all nodes, but actual facets are lazily allocated

� Network interface translates between global segment IDs and local facets

Page 12: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 12

�Talk Outline

�Overview of Hamal

� The Hamal memory system

� Threads and synchronization

� Fault-tolerant messaging protocol

Page 13: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 13

�Motivation

� Run time for a problem of size m on N nodes:

� Optimal run time for fixed m:

)log(21

NCN

mCt +=

+=⇒=

2

1

212log1

C

mCCt

N

mCC

Page 14: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 14

�Thread Creation

� fork instruction specifies:

� Starting address

� Destination node

� Set of registers to copy to child

� Each node contains a hardware fork queue

� Queue is serviced by microkernel

fork

Page 15: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 15

�Register Dribbling

� Each thread has a swap page in memory

� Threads are loaded/unloaded in the background on unused memory cycles (Register dribbling - [Soundararajan92])

� Reduces overhead of thread swapping

� Least recently issued (LRI) context constantly dribbles

� Reduces latency of thread swapping

Page 16: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 16

�Thread Suspension

� When should a blocked thread be suspended?

� Two part strategy:

1. Wait until

a) No context can issue

b) The LRI context is clean

2. Generate a stall event and allow the microkernel to decide if the thread should be suspended

Page 17: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 17

�Register-Based Synchronization

� join capabilities allow one thread to write directly to another thread’s registers

� Three instructions: jcap, busy, and join

Parent Thread

r0 = jcap r1r1 = busyfork _child, {r0}r1 = and r1, r1

Child Thread

_child:

<computation>join r0, 0

Page 18: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 18

�Example - Barrier

doacross

Page 19: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 19

�Barrier Times

barrier time

12

58

105

161

216

273

333

393

456

523

0

100

200

300

400

500

600

1 2 4 8 16 32 64 128 256 512

# processors

tim

e (

cycle

s)

Page 20: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 20

�Benchmark - ppadd

ppadd

11

12

13

14

15

16

17

18

19

20

1 2 4 8 16 32 64 128 256 512

# processors

log(tim

e)

1024

2048

4096

8192

16384

32768

65536

Page 21: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 21

�UV Trap Bits

� Each memory word is tagged with two user trap bits (U, V)

� Each memory operation may optionally:

� Trap on U high, U low, V high, V low

� Modify U, V if successful

� Traps generate events which are handled by the microkernel on the node containing the memory word

Page 22: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 22

�Example – Word Locking

� Aqcuire:

� load, trap on U high or V high, set U

� Release:

� store, trap on V high, clear U

busy11

trap10

unavailable01

available00

MeaningVU

Page 23: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 23

�Example – Word Locking

B

B

C

B

CD

A

A

A

A

B

CD

B

CD

B

CD

= word = lock = join capability= thread

Page 24: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 24

�Benchmark – wordcount

� Frequency count of words in [Brown02]

� Distributed hash table used to store counts

� remote version: access hash table remotely

� local version: create a thread on target node 0

0.5

1

1.5

2

2.5

3

3.5

ex

ecu

tio

n t

ime

remote local

spin

UV

Page 25: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 25

�Talk Outline

�Overview of Hamal

� The Hamal memory system

� Threads and synchronization

� Fault-tolerant messaging protocol

Page 26: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 26

�Motivation

Discarding vs. Non-Discarding

Internet Examples Most || Computers

Performance �

� Simplicity �

� Reliability �

Page 27: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 27

�Fault Tolerant Messaging

� Idempotence?

� Sequence numbers (e.g. TCP)

� 220 nodes, 32 bits ⇒ 8MB/node

MSG MSG

MSG ACK

Page 28: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 28

�Idempotent Messaging Protocol

MSG MSG

� CONF: No more messages will be sent

� [Brown01]

MSG ACK MSG

CONF MSG

Page 29: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 29

�Out of Order Packets

MSG 7

sender receiver

MSG 7MSG 7

ACK 7

MSG 7

MSG 7

CONF 7

MSG 7MSG 7 CONF 7

Page 30: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 30

�Message Identification

� Sender generates message ID

� All packets contain source node and msg. ID

ACKSender ReceiverMSG

CONF

� ID identifies MSG

� source node gives destination for CONF

� (source node, ID) identifies MSG

Page 31: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 31

�Can We Reduce Overhead?

� ACK/CONF packets:

� Two ideas to reduce size:

1. Use short (4-8 bit) MSG ID

2. Receiver assigns short secondary ID

type dest message IDsource

ID2type dest message IDsource

type dest ID2

ACK:

CONF:

Page 32: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 32

�Failure of Short IDs

MSG 7

sender receiver

MSG 7

CONF 7

ACK 7

ACK 7 MSG 7

MSG 7

MSG 7

ACK 7

MSG 7

CONF 7 MSG 7

Page 33: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 33

�Secondary IDs

MSG 5

sender receiver

MSG 5

MSG 9

MSG 9

CONF 2

MSG 5,2MSG 5 ACK 5,2

CONF 2

MSG 5,2

ACK 5,2

MSG 9 MSG 9,2

ACK 9,2

MSG 9 MSG 9

Page 34: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 34

�How Many Bits Is Enough?

� Can’t reuse an ID if it’s still in the system

� But receivers can remember an ID for arbitrarily long periods of time

� Solution:

� use 48 bit IDs

� flush the network every 4-12 months when a node runs out of IDs

Page 35: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 35

�Experimental Results

� Trace driven simulation of 4 microbenchmarkson 4 topologies

� Linear backoff for packet retransmission

� Small send tables (8 entries)

� Larger receive tables (64 entries)

� Buffer ~4 flits at each network node

Page 36: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 36

�Summary

� Scalable memory system

� Capabilities → single shared virtual address space

� Hardware page tables, sparsely faceted arrays

� Low overhead for parallel programs

� Efficient thread management

� Register-based synchronization

� UV Trap bits

� Discarding network

� Lightweight fault-tolerant messaging protocol

Page 37: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 37

�Conclusion

Yesterday – ENIAC Today – P4

Soon – 1M nodes Tomorrow – ???

Page 38: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 38

�Comparison with Non-Discarding

0

200

400

600

800

1000

2D Grid 3D Grid Fat Tree Butterfly0

2000

4000

6000

8000

10000

12000

14000

2D Grid 3D Grid Fat Tree Butterfly

0

50000

100000

150000

200000

250000

2D Grid 3D Grid Fat Tree Butterfly

0

20000

40000

60000

80000

100000

2D Grid 3D Grid Fat Tree Butterfly

add reverse

nbodyquicksort

Non-Discarding Discarding + fault-tolerant messaging

Page 39: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 39

�Hamal Benchmarks

� ppadd – Parallel-prefix addition

� quicksort – Parallel quicksort

� nbody – exact N-body simulation, 256 bodies

� Processors conceptually organized in square array

� Communication is in rows and columns only

� wordcount – frequency count of words in [Brown02]

� Distributed hash table used to maintain counts

Page 40: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 40

�ppadd – Model vs. Simulations

ppadd

10

11

12

13

14

15

16

17

18

19

20

1 2 4 8 16 32 64 128 256 512

# processors

log(tim

e)

Page 41: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 41

�quicksort – Execution Time

quicksort

17

18

19

20

21

22

23

24

25

26

27

1 2 4 8 16 32 64 128 256 512

# processors

log(tim

e)

4096

8192

16384

32768

65536

131072

262144

Page 42: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 42

�quicksort - Speedup

quicksort

0

1

2

3

4

5

6

7

1 2 4 8 16 32 64 128 256 512

# processors

log(speedup)

4096

8192

16384

32768

65536

131072

262144

Page 43: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 43

�nbody – Speedup

nbody

0

2

4

6

8

1 4 16 64 256

# processors

log(speedup)

Page 44: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 44

�Register Dribbling

ppadd8

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

4 5 6 7 8 9 10 11 12 13 14 15 16

# contexts

tim

e (

cycle

s)

dribble on suspend

extended dribbling

quicksort

2800000

2850000

2900000

2950000

3000000

3050000

3100000

3150000

4 5 6 7 8 9 10 11 12 13 14 15 16

# contexts

tim

e (

cycle

s)

dribble on suspend

extended dribbling

nbody

2200000

2300000

2400000

2500000

2600000

2700000

2800000

2900000

4 5 6 7 8 9 10 11 12 13 14 15 16

# contexts

tim

e (

cycle

s)

dribble on suspend

extended dribbling

wordcount

0

200000

400000

600000

800000

1000000

1200000

1400000

4 5 6 7 8 9 10 11 12 13 14 15 16

# contexts

tim

e (

cycle

s)

dribble on suspend

extended dribbling

Page 45: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 45

�Network Benchmarks

� add – parallel prefix addition on 4096 nodes

� reverse – reverse the data of a 16K entry vector distributed across 1024 nodes

� quicksort – parallel quicksort of a 32K entry vector on 1024 nodes

� nbody – full N-body simulation on 256 nodes with one body per node

Page 46: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 46

�Network Topologies

� 2D Grid

� Dimension-ordered routing prefered

� 3D Grid

� Dimension-ordered routing prefered

� Fat tree

� radix-4 (down) dilation-2 (up), randomized

� Multibutterfly

� radix-2 dilation-2, randomized

Page 47: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 47

�Network Retransmission

L301.028L301.093overall

L321.015L321.041multibutterfly

L301.036L311.085fat tree

L301.018L321.0393D grid

L281.026L321.0332D grid

L311.045L311.085nbody

Q121.015Q151.020quicksort

L321.016L301.028reverse

L51.003L51.009add

average case worst case

slowdown over optimal

Page 48: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 48

�Network Send Table Size

add

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 2 4 8 16 32 64 128 256

reverse

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

1 2 4 8 16 32 64 128 256

quicksort

0

50000

100000

150000

200000

250000

300000

1 2 4 8 16 32 64 128 256

nbody

0

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32 64 128 256

Page 49: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 49

�Network Buffering

add

0

200

400

600

800

1000

1200

1400

1600

1 2 3 4 5 6 7 8

reverse

0

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 6 7 8

quicksort

0

50000

100000

150000

200000

250000

300000

350000

1 2 3 4 5 6 7 8

nbody

0

20000

40000

60000

80000

100000

120000

140000

160000

1 2 3 4 5 6 7 8

Page 50: Hamal Parallel Computer Design and Evaluation of theDecember 3, 2002 The Hamal Parallel Computer 45 Network Benchmarks add – parallel prefix addition on 4096 nodes reverse – reverse

December 3, 2002 The Hamal Parallel Computer 50

�Network Receive Table Size

add

0

500

1000

1500

2000

2500

3000

3500

16 32 64 128

reverse

0

2000

4000

6000

8000

10000

12000

14000

16000

16 32 64 128

quicksort

0

50000

100000

150000

200000

250000

300000

16 32 64 128

nbody

0

20000

40000

60000

80000

100000

120000

140000

16 32 64 128