Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
FP7-ICT-2011-7-287733 – ALMA Project Overview 1
FP7-ICT-2011-7-287733
ALMA Project Overview
ALMA Consortium
FP7-ICT-2011-7-287733 – ALMA Project Overview 2
Outline
ALMA EU Project Motivation
Project Overview
Target Architectures
ALMA Toolchain & Development Flow
Application Test Cases
Current Status
Summary
FP7-ICT-2011-7-287733 – ALMA Project Overview 3
ALMA Project ID Card
Three year project: 01/09/2011 – 30/08/2014
Funded by FP7: 3.2 Million Euros
Official web site: http://www.alma-project.eu/
Coordinator: Juergen Becker (KIT)
Technical Coordinator: Nikos Voros (TMES)
FP7-ICT-2011-7-287733 – ALMA Project Overview 5
Why do we need multi-core processors?
Until ~2005 processor performance increase driven by
Clock speed
Execution optimization
Cache
Power wall
ILP wall
Led to multicore processors
Parallelism must be exposed by the programmer
(source http://www.gotw.ca/publications/concurrency-ddj.htm)
FP7-ICT-2011-7-287733 – ALMA Project Overview 6
Processor Breakthroughs ....
A major architecture disruption: multiprocessing and
specialization will have a strong impact on software
Time
Processor
Power
Intel 8086
1980 1990 2000
PowerQUICC II
Multi-processing
Processor
specialization
2015-2020 (?)
Architectural
break point
CISC era
RISC era
Technology limitations: perf. by
parallelism no more by frequency =>
disruption in programming model, long
term research challenges
Domain oriented architectures: eg with
predictable performance to control the
timeliness in RT critical applis, dynamic
reconfiguration for adaptive, distributed
critical architectures (multilevel RT
composability) Source:
G. Edelin
(Thales),
2009
FP7-ICT-2011-7-287733 – ALMA Project Overview 7
Motivation
End user perspective Target architecture perspective
• Explore/Develop algorithms
• Use a simple, comfortable language
• E.g. Matlab, Scilab, …
• Don’t want to care about • data types • parallelism
• End result
• Performance • Energy efficient • Cost efficient • Fast development time
• Multi-Processor System-on-Chip
• Parallel processor cores
• Parallel programming model • E.g. pthreads, MPI, OpenMP
• Parallelism with the processor cores
• Single Instruction Multiple Data • Very Long Instruction Word
• Native data types
• E.g. 32-bit integer • Other data types perform
inefficient
FP7-ICT-2011-7-287733 – ALMA Project Overview 8
ALMA in a Nutshell
Hide the complexity of the underlying hardware to the end user
ALMA will develop an approach for compiling annotated Scilab code to MPSoC architectures
Algorithms and tools for High-level, platform-independent application code performance
estimation and optimization
Identification of possible partitions and their placement on different resources of the underlying architectures
Data type binding and data parallelization to exploit data-level parallelism
Develop an unified SystemC simulation framework to provide an environment for simulating MPSoCs
Two state-of-the-art architectures provided by RECORE and KIT
Net result: smaller application development time/effort and faster time-to-market
FP7-ICT-2011-7-287733 – ALMA Project Overview 9
Objectives
Extend Scilab for optimization on high-level system models
Develop a parallelization and optimization environment
Employ and extend two different architectures
Parallel code generation
Parallel code simulation
FP7-ICT-2011-7-287733 – ALMA Project Overview 10
Challenges for Compiling Scilab to MPSoCs
Scilab (Matlab-like) programming language Dynamic typing (scalars, vectors, matrices)
Pointer-free, i.e. no memory aliasing problems
End users typically use floating-point data types
Natural parallelism within vector operations
MPSoC target architectures Exploit coarse-grain parallelism (task-level)
Distributed memory
Exploit fine-grain parallelism (instruction-level)
FP7-ICT-2011-7-287733 – ALMA Project Overview 11
ALMA Architectures (1/2): Recore X2014
Scalability by virtue of Packet-switched Network-on-Chip
Distributed memories & I/O
Distributed control
Distributed processing cores
Xentium® processing tile Fixed-point DSP processing
10-issue VLIW processor
SIMD capability
Streaming communication services
Reconfigurability Smart memory tile (RAM/FIFO)
Separate applications from each others
Guarantee QoS
Fault tolerant application mapping
FP7-ICT-2011-7-287733 – ALMA Project Overview 12
ALMA Architectures (2/2): KIT Kahrisma
Pro
ce
ss
or
Co
ntr
ol U
nit
Ma
in M
em
ory
Co
nte
xt M
em
ory
Ca
ch
e S
ub
syste
m
Control Flow Tiles
Rename Tiles
EDPE Array
Instruction Cache Tiles
EDPE
EDPE
EDPE
EDPE
DSP Instance I DSP Instance II
2-issue VLIW 2-issue VLIW ISA4-issue VLIW ISA 6-issue VLIW ISAProcessor
Instances
DSP Instances
Instr
uctio
n p
re-
pro
ce
ssin
g
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
EDPE
…...
…...
…...
…...
…...
…...
Dynamic reconfigurable MPSoC
Modules can be reconfigured to processors or DSPs
Dynamic clustered VLIW processor instances
Local scratchpad memory
Non-coherent access to main memory
Communication between cores through a network
FP7-ICT-2011-7-287733 – ALMA Project Overview 13
Outline
Agenda
ALMA EU Project
Motivation
Project Overview
Target Architectures
ALMA Toolchain & Development Flow
Application Test Cases
Current Status
Summary
FP7-ICT-2011-7-287733 – ALMA Project Overview 14
ALMA Development Flow (overview)
Optimized
application code on
multi-core platform
Embedded application design Multi-core hardware design
Translation to
Scilab &
annotations
Abstract
hardware
description
(ADL)
KIT
C-compiler
Multi-core
simulator
Parameters for algorithm
optimization
C-based code with parallel descriptions
ALMA
algorithm
parallelization
tools
Executable binary (for simulator and HW)
Recore
C-compiler
Structural hardware
description
Feedback for optimization
FP7-ICT-2011-7-287733 – ALMA Project Overview 15
Input for the ALMA tools
ALMA dialect of the Scilab language
Subset of Scilab language
Extended by a preprocessing language
Variables declaration
Static types specification
Maximum size of vector and matrix data types definition
Extended by an annotation language for supporting parallelism extraction
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 16
ALMA Front-end tools
Scilab Front-End (SAFE)
Parses Scilab source code and produces high level intermediate representation (HLIR) expressed in C
ALMA profiler (aprof)
Early performance estimation at the HLIR level
High-Level Optimizer (HLO) Applies platform independent
optimizations to the HLIR
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 17
Parallelization Tools (Fine grain extraction)
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
Floating point to fixed point No hardware support for FP in
embedded multi-core systems
Provide a automated floating to fixed point conversion tool.
SIMD/SWP parallelization Loop parallelization and layout
optimization for SIMD ISA.
Explore perf./accuracy trade-off in fixed point encodings
FP7-ICT-2011-7-287733 – ALMA Project Overview 18
Parallelization Tools (Coarse-grain extraction)
Coarse-grain parallelism extraction and optimization
Responsible for the global optimization
Transformation of ALMA IR CFDG to Hierarchical Task Graph (HTG)
Resource availability from ADL
HTG partitioning to cores
Optimal mapping and scheduling of tasks to architecture resources
Exploits profiling information from the simulator for better resource usage estimation
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 19
Parallel platform code generation
Parallel platform code generation
Generates target-specific C code
Maps Scilab variables to memory locations
Expresses communication
Expresses SIMD instruction as intrinsics
Uses Recore/Kahrisma C compiler Exploits ILP by VLIW compilation
Generates executable for the hardware and simulator
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 20
Multicore architecture simulation
Multicore architecture simulation
Simulation of ALMA target architectures
Retargetable
Structure defined by ADL
Implementation by library of SystemC modules
Mixed-accuracy simulation Behavioural or cycle-accurate
For individual modules (processor core, memory subsystem, network)
Collect profiling and tracing information
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 21
ALMA Architecture Description Language (ADL)
ALMA ADL
Architecture Description Language (ADL)
Tailored to the requirements of ALMA
1. Enables target independence of the compilation toolchain
2. Used as architecture description for the simulator
3. Enables design-space exploration
Compact specification of regular MPSoC structures by for and if constructs
Structural specification annotated with behavioural information
Applications
Telecom-munication
Image Processing
Annotated Scilab Code
Architecture Description
ADL
ADL Compiler
GeCoS Framework
Fine-GrainParallelism Ext.
Corase-Grain Parallelism Ext.
Parallel Code Generation
ALMA IR
ALMA IR
Annotated C Code
Target-Spec. Compilation
C Code + Back-Annotation
ALMA Multi-Core Simulator
Binary
ProfileInformation
JSON
Kahrisma Compiler
Recore Compiler
ALMA Architectures
Kahrisma Arch.
Recore Arch.
Ite
rati
ve O
pti
miz
atio
n
ALMA Front-End Tools
Source-LevelProfiler
ProfileInformation
SciLab Front-End(SAFE)
High-Level Optimizer
HLIR
HLIR
Legend
ADL
App. Code
Information
FP7-ICT-2011-7-287733 – ALMA Project Overview 22
Outline
Agenda
ALMA EU Project
Motivation
Project Overview
Target Architectures
ALMA Toolchain & Development Flow
Application Test Cases
Current Status
Summary
FP7-ICT-2011-7-287733 – ALMA Project Overview 23
IEEE 802.16e PHY Layer in NT x NR MIMO Configuration
Typical example of a state-of-the-art wireless communication system
Application requirements impose hard, real-time constraints.
Design time must follow shrinking time-to-market.
Test Case (1/2): Telecommunications
Rx 1
Rx NR
FFT
Equalizer
Channel
Estimator
Derando
mizer
Deinter
leaver
Symbol
Deconstr
uction
- Cyclic
Prefix
Diversity Combiner
- Cyclic
Prefix FFT
SDU
Generati
on
Data
SDUs
Uplink
Frame
Deconstr
uction
MAC-
PHY
I/F
BS Rx
`
ALMA 1st Increment ALMA 2nd Increment
Tx 1
Tx NT
FEC
Encod
er
Interle
aver
Constel.
Mapping
IFFT + Cyclic
Prefix
S-T
Coding
IFFT + Cyclic
Prefix
+ Pre
amble
Data
SDUs
PHY MAC
UL/DL
Frame
Mapper
UL/DL
Schedul
er
BS Tx
PDU
Generation
MAC-
PHY
I/F
Frame
Constr
uction
Downlink MAC/PHY
Control
Symbol
Constr
uction
Rando
mizer
. .
. . . .
. . .
. .
.
. .
. . .
. . .
.
FEC
Decoder
Const.
Demap
FP7-ICT-2011-7-287733 – ALMA Project Overview 24
Feature based algorithm for object recognition and multi-object tracking
Use of Scale Invariant Feature Transform (SIFT)
The final goal is to run such applications in smart cameras
Test Case (2/2): Image Processing
FP7-ICT-2011-7-287733 – ALMA Project Overview 25
Summary
ALMA Goal: Hide the complexity of the underlying hardware to the end user
ALMA will develop an approach for compiling annotated Scilab code to MPSoCs
ALMA toolchain components Front-end tools (Scilab parser, high-level optimizer, profiler)
Fine-grain parallelism extraction
Coarse-grain parallelism extraction
SystemC multi-core simulator
ALMA toolchain is kept platform independent by a novel Architecture Description Language
Two state-of-the-art architectures provided by RECORE and KIT
Evaluated by two test cases from Telecommunications and Image Processing domain
FP7-ICT-2011-7-287733 – ALMA Project Overview 26
Thank you !