16
Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong Chapel Team, Cray Inc.

Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

Global HPCC Benchmarks in Chapel:STREAM Triad, Random Access, and FFT

HPC Challenge BOF, SC06Class 2 Submission

November 14, 2006

Brad Chamberlain, Steve Deitz,Mary Beth Hribar, Wayne Wong

Chapel Team, Cray Inc.

Page 2: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

OverviewChapel: Cray’s HPCS language

Our approach to the HPC Challenge codes:• performance-minded• clear, intuitive, readable• general across…

typesproblem parametersmodular boundaries

Page 3: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Code Size Summary

15612486

1406

1668

433

0

200

400

600

800

1000

1200

1400

1600

1800

Reference Chapel Reference Chapel Reference Chapel

SLO

C

Reference VersionFrameworkComputation

Chapel VersionProb. Size (common)Results and outputVerificationInitializationKernel declarationsKernel computation

STREAMTriad

RandomAccess FFT

Page 4: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Chapel Code Size Summary

156

86

124

0

20

40

60

80

100

120

140

160

180

STREAM Triad Random Access FFT

SLO

C

Problem Size(common)Results and output

Verification

Initialization

Kernel declarations

Kernel computation

Page 5: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Chapel Code Size Summary1299

593

863

0

200

400

600

800

1000

1200

1400

STREAM Triad Random Access FFT

Stat

ic L

exic

al T

oken

s

Problem Size(common)Results and output

Verification

Initialization

Kernel declarations

Kernel computation

Page 6: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

STREAM Triad Overview

const ProblemSpace: domain(1) distributed(Block) = [1..m];var A, B, C: [ProblemSpace] elemType;

A = B + alpha * C;

Page 7: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

STREAM Triad Overview

const ProblemSpace: domain(1) distributed(Block) = [1..m];var A, B, C: [ProblemSpace] elemType;

A = B + alpha * C;

ProblemSpace

Declare a 1D arithmetic domain (first-class index set)

L0 L1 L2 L3 L4

Specify its distribution

A

B

C

Use domain to declare distributed arrays

alpha

=

+

*

=

+

*

=

+

*

=

+

*

=

+

*

Express computation using promoted scalar operators and whole-array references ⇒ parallel computation

Page 8: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Random Access Overview

[i in TableSpace] T(i) = i;

forall block in subBlocks(updateSpace) dofor r in RAStream(block.numIndices, block.low) doT(r & indexMask) ^= r;

Page 9: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Random Access Overview

[i in TableSpace] T(i) = i;

forall block in subBlocks(updateSpace) dofor r in RAStream(block.numIndices, block.low) doT(r & indexMask) ^= r;

Initialize table using a forall expression

Express table updates using forall- and for-loops

Random stream expressed modularly using an iterator

iterator RAStream(numvals, start:randType = 0): randType {

var val = getNthRandom(start);for i in 1..numvals {

getNextRandom(val);yield val;

}}

Page 10: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

FFT Overview (radix 4)for i in [2..log2(numElements)) by 2 {

const m = span*radix, m2 = 2*m;

forall (k,k1) in (Adom by m2, 0..) {var wk2 = …, wk1 = …, wk3 = …;

forall j in [k..k+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);

wk1 = …; wk3 = …; wk2 *= 1.0i;

forall j in [k+m..k+m+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);

}span *= radix;

}

def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }

Page 11: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

FFT Overview (radix 4)for i in [2..log2(numElements)) by 2 {

const m = span*radix, m2 = 2*m;

forall (k,k1) in (Adom by m2, 0..) {var wk2 = …, wk1 = …, wk3 = …;

forall j in [k..k+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);

wk1 = …; wk3 = …; wk2 *= 1.0i;

forall j in [k+m..k+m+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);

}span *= radix;

}

def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }

Parallelism expressed using nested forall-loops

Support for complex and imaginary math simplifies FFT arithmetic

Generic arguments allow routine to be called with complex, real, or imaginary twiddle factors

Page 12: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Chapel Compiler StatusAll codes compile and run with our current Chapel compiler• focus to date has been on…

prototyping Chapel, not performancetargeting a single locale

• platforms: Linux, Cygwin (Windows), Mac OS X, SunOS, …

No meaningful performance results yet• written report contains performance discussions for our codes

Upcoming milestones• December 2006: limited release to HPLS team• 2007: work on distributed-memory execution and optimizations• SC07: intend to have publishable performance results for HPCC`07

Page 13: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

SummaryHave expressed HPCC codes attractively• clear, concise, general• express parallelism, compile and execute correctly on one locale• benefit from Chapel’s global-view parallelism• utilize generic programming and modern SW Engineering principles

Our written report contains:• complete source listings• detailed walkthroughs of our solutions as Chapel tutorial• performance notes for our implementations

Report and presentation available at our website:http://chapel.cs.washington.edu

We’re interested in your feedback:[email protected]

Page 14: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

Backup Slides

Page 15: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

Compact High-Level Code…CGEP FT

54

36

17

25

3

0

10

20

30

40

50

60

70

80

F+MPI ZPL

Language

Line

s of

Cod

e

communicationdeclarationscomputation

82

38

79

37

89

0

50

100

150

200

250

300

F+MPI ZPLLanguage

Line

s of

Cod

e

communicationdeclarationscomputation

249204

332

128

135

0

100

200

300

400

500

600

700

800

F+MPI ZPLLanguage

Line

s of

Cod

e

communicationdeclarationscomputation

MG IS

242

70

202

87

566

0

200

400

600

800

1000

1200

F+MPI ZPLLanguage

Line

s of

Cod

e

communicationdeclarationscomputation

152

111

72

80

22

0

50

100

150

200

250

300

C+MPI ZPLLanguage

Line

s of

Cod

e

communicationdeclarationscomputation

152111

72

80

22

0

50

100

150

200

250

300

C+MPI ZP L

Language

Line

s of

Cod

ecommunicationdeclarationscomputation

Page 16: Global HPCC Benchmarks in ChapelGlobal HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain,

HPCC BOF, SC06

…need not perform poorlyCGEP FT

MG IS

C/Fortran + MPI

ZPL versions

See also Rice University’s recent D-HPF work…