Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Compiling for deeply embedded and heterogeneous signal processing systemsJeronimo CastrillonCfaed Chair for Compiler Construction (CCC)
5G Summit, Dresden, GermanySeptember 29, 2016
Multi-Processor/core Systems on Chip
q HW complexityq Increasing number of coresq Increasing heterogeneity
q Heterogeneity and size
© Prof. J. Castrillon. 5G Summit2
0 2 4 6 8
10 12 14 16
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Num
ber o
f PEs
Year
PE Count in SoCs
OMAP Family
OMAP1 OMAP2OMAP3530
OMAP3640OMAP4430
OMAP4470
OMAP5430
Snapdragon Family
S1 QSD8650 S2 MSM8255S3 MSM8060
S4 MSM8960
S4 APQ8064
[Castrill14]
DMAs, sema-
phoresPMU
Peripherals
Communicationsupport
HW queues
Network Processor
Packet DMA
MEMsubsystem
NoC
A15L1
A15L1
L2A15L1
A15L1
VLIW DSP
L1,L2
TI Keystone IIEphiphany IV
Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf
Multi-Processor/core Systems on Chip
q SW productivity gapq How to program these systems? q Meet performance/energy requirements
© Prof. J. Castrillon. 5G Summit3
DMAs, sema-
phoresPMU
Peripherals
Communicationsupport
HW queues
Network Processor
Packet DMA
MEMsubsystem
NoC
A15L1
A15L1
L2A15L1
A15L1
VLIW DSP
L1,L2
TI Keystone IIEphiphany IV
Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf
Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs
Homogeneous! OpenMP support uses 1/3 of program memory!
5G
Massive MIMO
High-perf.Codes
Complex Beamform.
5G
Massive MIMO
High-perf.Codes
Complex Beamform.
Multi-Processor/core Systems on Chip
q SW productivity gapq How to program these systems? q Meet performance/energy requirements
© Prof. J. Castrillon. 5G Summit4
DMAs, sema-
phoresPMU
Peripherals
Communicationsupport
HW queues
Network Processor
Packet DMA
MEMsubsystem
NoC
A15L1
A15L1
L2A15L1
A15L1
VLIW DSP
L1,L2
TI Keystone IIEphiphany IV
Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf
Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs
Homogeneous! OpenMP support uses 1/3 of program memory!
Need for software automation tools
© Prof. J. Castrillon. 5G Summit5
Programming flow
DMAs, sema-
phoresPMU
Peripherals
Communicationsupport
HW queues
Network Processor
Packet DMA
MEMsubsystem
NoC
A15L1
A15L1
L2A15L1
A15L1
VLIW DSP
L1,L2
Dataflow application
Architecture model
Non-functional specification
Analysis
Synthesis
Code generation
Property models (timing, energy, error, …)
PNargs_ifft_r.ID = 6U;PNargs_ifft_r.PNchannel_freq_coef = filtered_coef_rightPNargs_ifft_r.PNnum_freq_coef = 0U;PNargs_ifft_r.PNchannel_time_coef = sink_rightPNargs_ifft_r.channel = 1;sink_left = IPCllmrf_open(3, 1, 1);sink_right = IPCllmrf_open(7, 1, 1);PNargs_sink.ID = 7U;PNargs_sink.PNchannel_in_left = sink_leftPNargs_sink.PNnum_in_left = 0U;PNargs_sink.PNchannel_in_right = sink_rightPNargs_sink.PNnum_in_right = 0U;taskParams.arg0 = (xdc_UArg)&PNargs_src;taskParams.priority = 1;
ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtr&taskParams, &eb);
glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_fft_ltaskParams.priority = 1;
ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrft_Templ, &taskParams, &eb);
glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_ifft_rtaskParams.priority = 1;
ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrfft_Templ, &taskParams, &eb);
glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_sinktaskParams.priority = 1;
[Castrill14]
Dataflow programming models
q Graph representation of applicationsq Implicit repetitive execution of tasksq Good model for streaming applicationsq Good match for signal processing & multi-media applications
q Large body of research on multiple flavors of these models
6 © Prof. J. Castrillon. 5G Summit
Static Dynamic
Properties: No race conditions, determinism, strong/weak guarantees
Language: C for process networks
q FIFO Channels
q Processes & networks
© Prof. J. Castrillon. 5G Summit7
typedef struct { int i; double d; } my_struct_t;__PNchannel my_struct_t S;__PNchannel int A = {1, 2, 3}; /* Initialization */__PNchannel short C[2], D[2], F[2], G[2];
__PNkpn AudioAmp __PNin(short A[2]) __PNout(short B[2]) __PNparam(short boost){
while (1)__PNin(A) __PNout(B) {for (int i = 0; i < 2; i++)B[i] = A[i]*boost;
}}__PNprocess Amp1 = AudioAmp __PNin(C) __PNout(F) __PNparam(3);__PNprocess Amp2 = AudioAmp __PNin(D) __PNout(G) __PNparam(10);
[Sheng14]
Architecture model and constraints
q System model including:q Topology, interconnect, memoriesq Computation: uArch modelq Communication: cost functionsq MCA standardizing it to SHIM 2.0
q Constraintsq Time constraints
q Mapping constraintsq Platform constraints
© Prof. J. Castrillon. 5G Summit8
…… [Oden13]
1 ms
3 ms
1 ms
© Prof. J. Castrillon. 5G Summit9
Analysis and synthesis: Overview
Non-functional specification
CPN application
Architecture model
Analysis: Instrumentation, profiling, tracing
Sequential performance estimation
Mapping and scheduling
Mapping configuration
Time-annotated traces
Parallel perf. estimation
Increase resources
[Castrill13]
10
Example 1) From LabVIEW to Tomahawk Platform
© Prof. J. Castrillon. 5G Summit
Baseband processing application (data dependent execution paths)
hs-serial
hs-serial
hs-serial
hs-serial
hs-serial
parallelRouter(1,0)
FPGA-Interface
Router(1,1)
Router(0,1)
Router(0,0)
Duo-PE0
Duo-PE1
FEC
Duo-PE2
SDDuo-PE6
Duo-PE7
Duo-PE5
Duo-PE3 CM
ADPLL, PM
GT
ADPLL, PM
GT
ADPLL, PM
GT
AVS Contr.
UART-GPIO
ADPLL, PM
GT
ADPLL, PM
GT
ADPLL, PM
GT
ADPLL, PM
GT
ADPLL, PM
GT
ADPLL, PMGT
ADPLL, PMGT
ADPLL, PMGT
ADPLL
ADPLL
DDR-SDRAM-Interface
Tomahawk2_core
APP
Duo-PE4
ADPLL, PMGT
VDSPRISC
VDSPRISC
VDSPRISC
VDSPRISC
VDSPRISC
VDSPRISC
VDSPRISC
ADPLL, PMGT
VDSPRISC
Tomahawk Chip (accelerators for baseband processing)
Automatic code generation
Visit demo booth
[Arn
old1
3]
11
Example 2) LTE application mapping
© Prof. J. Castrillon. 5G Summit
Cor
tess
y: S
ilexi
caSo
ftwar
e So
lutio
ns G
mbH
12
Example 2) LTE application execution
© Prof. J. Castrillon. 5G Summit
Cor
tess
y: S
ilexi
caSo
ftwar
e So
lutio
ns G
mbHSchedule
13
Example 2) LTE application: Results
© Prof. J. Castrillon. 5G Summit
8,62
1,71 2,082,79
0
2
4
6
8
10
PeakPower(W)
2,62
1,55 1,641,99
0
0,5
1
1,5
2
2,5
3
AveragePower(W)
327,8
1930
966,5
348,2
0
500
1000
1500
2000
2500
ExecutionTime(ms)
1,164
0,3340,631
1,443
0
0,5
1
1,5
2
PowerEfficiency(1/(W*second))
Reference (LoadBalancer)
Constraint=2s
Constraint=1s
Constraint=0.35s
Cor
tess
y: S
ilexi
caSo
ftwar
e So
lutio
ns G
mbH
Acknowledgements
q Vodafone Chair for Mobile Communications Systems
q Silexica Software Solutions GmbH
q National Instruments
q German Cluster of Excellence: Center for Advancing Electronics Dresden (www.cfaed.tu-dresden.de)
© Prof. J. Castrillon. 5G Summit14
Thanks!Questions?
References
[Castrill14] J. Castrillon and R. Leupers, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, 2014
[Sheng14] W. Sheng, S. Schürmans, M. Odendahl, M. Bertsch, V. Volevach, R. Leupers, and G. Ascheid, “A compiler infrastructure for embedded heterogeneous MPSoCs”, Parallel Comput. 40, 2 (February 2014), 51-68
[Oden13] M. Odendahl, et al., “Split-cost communication model for improved MPSoC application mapping”, In International Symposium on System on Chip pp. 1-8, 2013
[Castrill13] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 527–545, 2013
[Arnold13] O. Arnold, et al. “Tomahawk - Parallelism and Heterogeneity in Communications Signal Processing MPSoCs”. TECS, 2013
© Prof. J. Castrillon. 5G Summit16