Upload
kasimir-buckley
View
21
Download
0
Embed Size (px)
DESCRIPTION
The Berkeley UPC Compiler. Wei Chen The LBNL/Berkeley UPC Group. Unified Parallel C (UPC). UPC is a parallel extension to C for scientific computing With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. Global Address Space Abstraction SPMD parallelism - PowerPoint PPT Presentation
Citation preview
Unified Parallel C at LBNL/UCB
The Berkeley UPC Compiler
Wei Chen The LBNL/Berkeley UPC Group
Unified Parallel C at LBNL/UCB
Unified Parallel C (UPC)
• UPC is a parallel extension to C for scientific computing- With distributed arrays, shared pointers,
parallel loops, strict/relaxed memory model.
- Global Address Space Abstraction• SPMD parallelism• There are vendor compilers on several
machines• HP Alpha Server, Cray, Sun, SGI• Open source compiler developed by
LBNL/UCB (beta release 3/31)
Unified Parallel C at LBNL/UCB
Overview of Berkeley UPC Compiler
TranslatorUPC Code
Translator Generated C Code
Berkeley UPC Runtime System
GASNet Communication System
Network Hardware
Platform-independent
Network-independent
Compiler-independent
Language-independent
Two Goals: Portability and High-Performance
Open64 based
Unified Parallel C at LBNL/UCB
Implementing the UPC to C Translator
Preprocessed File
UPC front end
VH Whirl w/ shared types
Backend lowering
High Whirl w/ runtime calls
Whirl2c
ANSI-compliant C Code
• Source to source translation
• Ported to gcc 3.2 (done by Rice Open64)
• Supports both 32/64 bit platforms
• Designed to incorporate existing optimization framework (currently not enabled)
• Communicate with runtime via a standard API and configuration files
Unified Parallel C at LBNL/UCB
Components in the Translator
• Front end:- UPC extensions to C:
shared qualifier, block size, forall loops, builtin functions and values (blocksizeof, localsizeof, etc.), strict/relaxed
- Parses and type-checks UPC code, generates Whirl, with UPC-specific information available in symbol table
• Backend:- Transform shared read and writes into calls into runtime
library (after LNO on H whirl). - Calls can be blocking/non-blocking/bulk/register-based
• Whirl2c:- Shared variables are declared as opaque pointer-to-shared- For static shared variables, allocate and initialize them
dynamically
Unified Parallel C at LBNL/UCB
Modifications
• Symbol Table- Add flags for shared, strict/relaxed, and block size for
TY_TAB
• Intrinsics- Each UPC runtime function is represented by a new
intrinsic (about 100 of them)
• Driver- Use sgiupc to compile UPC programs- New flags for passing config file, number of threads
• C front end- Modify gccfe/gnu to parse upc extensions, also fixes for
ANSI-compliance- Modify gccfe to support upc_forall loops (transformed
to WHILE_DO, marked by pragma)- Name mangling for static variables
Unified Parallel C at LBNL/UCB
Modifications II
• Backend- Add new lowering phases for transforming shared
accesses- Use some VH Whirl (e.g. comma to spill return value)
- Adjust field offsets for structs that have shared pointers (also in front end for sizeof)
- Symbol table not consistent till lowering finishes
- Dynamic nesting of forall loops• Whirl2c
- Various UPC-specific changes and bug fixes- Access thread-local data through macros- Dynamically allocate static user data
Unified Parallel C at LBNL/UCB
Future Work
• Add UPC-specific optimizations- Possibly as a new phase- Likely will use/modify PREOPT and LNO (alias
analysis, dependence analysis, prefetching)- Want WOPT too -- possible to extend whirl2c
to work for M Whirl?• Coordination Among Releases
- Our version has been merged with the Rice Open64 project
- Would like to merge with either Open64 or ORC
- One common CVS tree, with each team on different branches?
Unified Parallel C at LBNL/UCB
The End
Unified Parallel C at LBNL/UCB
UPC Programming Model Features
• SPMD parallelism- fixed number of images during execution- images operate asynchronously
• Several kinds of array distributions- double a[n] a private array on each processor- shared double a[n] a shared array, with cyclic mapping - shared [4] double a[n] a block cyclic array with 4-element
blocks - shared [0] double *a = (shared [0] double *) upc_alloc(n);
a shared array with all elements local• Pointers for irregular data structures
- shared double *sp a pointer to shared data- double *lp a pointers to private data
Unified Parallel C at LBNL/UCB
Parallel Loops in UPC
• UPC has a “forall” construct for distributing computation
Ex: Vector Additionshared int v1[N], v2[N], v3[N];
upc_forall (i=0; i < N; i++; &v3[i]) {
v3[i] = v2[i] + v1[i];
}
• Two kinds of affinity expressions: - Integer (compare with thread id) - Shared address (check the affinity of address)
• Affinity tests are performed on every iteration
Affinity Exp