11
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler Wei Chen The LBNL/Berkeley UPC Group

The Berkeley UPC Compiler

Embed Size (px)

DESCRIPTION

The Berkeley UPC Compiler. Wei Chen The LBNL/Berkeley UPC Group. Unified Parallel C (UPC). UPC is a parallel extension to C for scientific computing With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. Global Address Space Abstraction SPMD parallelism - PowerPoint PPT Presentation

Citation preview

Page 1: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

The Berkeley UPC Compiler

Wei Chen The LBNL/Berkeley UPC Group

Page 2: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Unified Parallel C (UPC)

• UPC is a parallel extension to C for scientific computing- With distributed arrays, shared pointers,

parallel loops, strict/relaxed memory model.

- Global Address Space Abstraction• SPMD parallelism• There are vendor compilers on several

machines• HP Alpha Server, Cray, Sun, SGI• Open source compiler developed by

LBNL/UCB (beta release 3/31)

Page 3: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Overview of Berkeley UPC Compiler

TranslatorUPC Code

Translator Generated C Code

Berkeley UPC Runtime System

GASNet Communication System

Network Hardware

Platform-independent

Network-independent

Compiler-independent

Language-independent

Two Goals: Portability and High-Performance

Open64 based

Page 4: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Implementing the UPC to C Translator

Preprocessed File

UPC front end

VH Whirl w/ shared types

Backend lowering

High Whirl w/ runtime calls

Whirl2c

ANSI-compliant C Code

• Source to source translation

• Ported to gcc 3.2 (done by Rice Open64)

• Supports both 32/64 bit platforms

• Designed to incorporate existing optimization framework (currently not enabled)

• Communicate with runtime via a standard API and configuration files

Page 5: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Components in the Translator

• Front end:- UPC extensions to C:

shared qualifier, block size, forall loops, builtin functions and values (blocksizeof, localsizeof, etc.), strict/relaxed

- Parses and type-checks UPC code, generates Whirl, with UPC-specific information available in symbol table

• Backend:- Transform shared read and writes into calls into runtime

library (after LNO on H whirl). - Calls can be blocking/non-blocking/bulk/register-based

• Whirl2c:- Shared variables are declared as opaque pointer-to-shared- For static shared variables, allocate and initialize them

dynamically

Page 6: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Modifications

• Symbol Table- Add flags for shared, strict/relaxed, and block size for

TY_TAB

• Intrinsics- Each UPC runtime function is represented by a new

intrinsic (about 100 of them)

• Driver- Use sgiupc to compile UPC programs- New flags for passing config file, number of threads

• C front end- Modify gccfe/gnu to parse upc extensions, also fixes for

ANSI-compliance- Modify gccfe to support upc_forall loops (transformed

to WHILE_DO, marked by pragma)- Name mangling for static variables

Page 7: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Modifications II

• Backend- Add new lowering phases for transforming shared

accesses- Use some VH Whirl (e.g. comma to spill return value)

- Adjust field offsets for structs that have shared pointers (also in front end for sizeof)

- Symbol table not consistent till lowering finishes

- Dynamic nesting of forall loops• Whirl2c

- Various UPC-specific changes and bug fixes- Access thread-local data through macros- Dynamically allocate static user data

Page 8: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Future Work

• Add UPC-specific optimizations- Possibly as a new phase- Likely will use/modify PREOPT and LNO (alias

analysis, dependence analysis, prefetching)- Want WOPT too -- possible to extend whirl2c

to work for M Whirl?• Coordination Among Releases

- Our version has been merged with the Rice Open64 project

- Would like to merge with either Open64 or ORC

- One common CVS tree, with each team on different branches?

Page 9: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

The End

Page 10: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

UPC Programming Model Features

• SPMD parallelism- fixed number of images during execution- images operate asynchronously

• Several kinds of array distributions- double a[n] a private array on each processor- shared double a[n] a shared array, with cyclic mapping - shared [4] double a[n] a block cyclic array with 4-element

blocks - shared [0] double *a = (shared [0] double *) upc_alloc(n);

a shared array with all elements local• Pointers for irregular data structures

- shared double *sp a pointer to shared data- double *lp a pointers to private data

Page 11: The Berkeley UPC Compiler

Unified Parallel C at LBNL/UCB

Parallel Loops in UPC

• UPC has a “forall” construct for distributing computation

Ex: Vector Additionshared int v1[N], v2[N], v3[N];

upc_forall (i=0; i < N; i++; &v3[i]) {

v3[i] = v2[i] + v1[i];

}

• Two kinds of affinity expressions: - Integer (compare with thread id) - Shared address (check the affinity of address)

• Affinity tests are performed on every iteration

Affinity Exp