Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Global HPCC Benchmarks in Chapel:STREAM Triad, Random Access, and FFT
HPC Challenge BOF, SC06Class 2 Submission
November 14, 2006
Brad Chamberlain, Steve Deitz,Mary Beth Hribar, Wayne Wong
Chapel Team, Cray Inc.
HPCC BOF, SC06
OverviewChapel: Cray’s HPCS language
Our approach to the HPC Challenge codes:• performance-minded• clear, intuitive, readable• general across…
typesproblem parametersmodular boundaries
HPCC BOF, SC06
Code Size Summary
15612486
1406
1668
433
0
200
400
600
800
1000
1200
1400
1600
1800
Reference Chapel Reference Chapel Reference Chapel
SLO
C
Reference VersionFrameworkComputation
Chapel VersionProb. Size (common)Results and outputVerificationInitializationKernel declarationsKernel computation
STREAMTriad
RandomAccess FFT
HPCC BOF, SC06
Chapel Code Size Summary
156
86
124
0
20
40
60
80
100
120
140
160
180
STREAM Triad Random Access FFT
SLO
C
Problem Size(common)Results and output
Verification
Initialization
Kernel declarations
Kernel computation
HPCC BOF, SC06
Chapel Code Size Summary1299
593
863
0
200
400
600
800
1000
1200
1400
STREAM Triad Random Access FFT
Stat
ic L
exic
al T
oken
s
Problem Size(common)Results and output
Verification
Initialization
Kernel declarations
Kernel computation
HPCC BOF, SC06
STREAM Triad Overview
const ProblemSpace: domain(1) distributed(Block) = [1..m];var A, B, C: [ProblemSpace] elemType;
A = B + alpha * C;
HPCC BOF, SC06
STREAM Triad Overview
const ProblemSpace: domain(1) distributed(Block) = [1..m];var A, B, C: [ProblemSpace] elemType;
A = B + alpha * C;
ProblemSpace
Declare a 1D arithmetic domain (first-class index set)
L0 L1 L2 L3 L4
Specify its distribution
A
B
C
Use domain to declare distributed arrays
alpha
=
+
*
=
+
*
=
+
*
=
+
*
=
+
*
Express computation using promoted scalar operators and whole-array references ⇒ parallel computation
HPCC BOF, SC06
Random Access Overview
[i in TableSpace] T(i) = i;
forall block in subBlocks(updateSpace) dofor r in RAStream(block.numIndices, block.low) doT(r & indexMask) ^= r;
HPCC BOF, SC06
Random Access Overview
[i in TableSpace] T(i) = i;
forall block in subBlocks(updateSpace) dofor r in RAStream(block.numIndices, block.low) doT(r & indexMask) ^= r;
Initialize table using a forall expression
Express table updates using forall- and for-loops
Random stream expressed modularly using an iterator
iterator RAStream(numvals, start:randType = 0): randType {
var val = getNthRandom(start);for i in 1..numvals {
getNextRandom(val);yield val;
}}
HPCC BOF, SC06
FFT Overview (radix 4)for i in [2..log2(numElements)) by 2 {
const m = span*radix, m2 = 2*m;
forall (k,k1) in (Adom by m2, 0..) {var wk2 = …, wk1 = …, wk3 = …;
forall j in [k..k+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);
wk1 = …; wk3 = …; wk2 *= 1.0i;
forall j in [k+m..k+m+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);
}span *= radix;
}
def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }
HPCC BOF, SC06
FFT Overview (radix 4)for i in [2..log2(numElements)) by 2 {
const m = span*radix, m2 = 2*m;
forall (k,k1) in (Adom by m2, 0..) {var wk2 = …, wk1 = …, wk3 = …;
forall j in [k..k+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);
wk1 = …; wk3 = …; wk2 *= 1.0i;
forall j in [k+m..k+m+span) dobutterfly(wk1, wk2, wk3, A[j..j+3*span by span]);
}span *= radix;
}
def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }
Parallelism expressed using nested forall-loops
Support for complex and imaginary math simplifies FFT arithmetic
Generic arguments allow routine to be called with complex, real, or imaginary twiddle factors
HPCC BOF, SC06
Chapel Compiler StatusAll codes compile and run with our current Chapel compiler• focus to date has been on…
prototyping Chapel, not performancetargeting a single locale
• platforms: Linux, Cygwin (Windows), Mac OS X, SunOS, …
No meaningful performance results yet• written report contains performance discussions for our codes
Upcoming milestones• December 2006: limited release to HPLS team• 2007: work on distributed-memory execution and optimizations• SC07: intend to have publishable performance results for HPCC`07
HPCC BOF, SC06
SummaryHave expressed HPCC codes attractively• clear, concise, general• express parallelism, compile and execute correctly on one locale• benefit from Chapel’s global-view parallelism• utilize generic programming and modern SW Engineering principles
Our written report contains:• complete source listings• detailed walkthroughs of our solutions as Chapel tutorial• performance notes for our implementations
Report and presentation available at our website:http://chapel.cs.washington.edu
We’re interested in your feedback:[email protected]
Backup Slides
HPCC BOF, SC06
Compact High-Level Code…CGEP FT
54
36
17
25
3
0
10
20
30
40
50
60
70
80
F+MPI ZPL
Language
Line
s of
Cod
e
communicationdeclarationscomputation
82
38
79
37
89
0
50
100
150
200
250
300
F+MPI ZPLLanguage
Line
s of
Cod
e
communicationdeclarationscomputation
249204
332
128
135
0
100
200
300
400
500
600
700
800
F+MPI ZPLLanguage
Line
s of
Cod
e
communicationdeclarationscomputation
MG IS
242
70
202
87
566
0
200
400
600
800
1000
1200
F+MPI ZPLLanguage
Line
s of
Cod
e
communicationdeclarationscomputation
152
111
72
80
22
0
50
100
150
200
250
300
C+MPI ZPLLanguage
Line
s of
Cod
e
communicationdeclarationscomputation
152111
72
80
22
0
50
100
150
200
250
300
C+MPI ZP L
Language
Line
s of
Cod
ecommunicationdeclarationscomputation
HPCC BOF, SC06
…need not perform poorlyCGEP FT
MG IS
C/Fortran + MPI
ZPL versions
See also Rice University’s recent D-HPF work…