Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
EE382V:
Embedded System Design and Modeling
Andreas Gerstlauer
Electrical and Computer Engineering
University of Texas at Austin
Lecture 8 – SCE Design Examples & Case Studies
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 2
Lecture 8: Outline
• System-on-Chip Environment (SCE)
• Abstract programming model
• Transaction-level system models
• Hardware/software implementation
• Design examples
• Baseband platform
• MP3 decoder
• Cellphone
• Commercialization
• Specify-Explore-Refine (SER) tools
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 3
Bridge
CPU Mem
HW IP
Arbiter
v1
C1
B1 B2
B3 B4
C2
CommunicationComputation &
System Synthesis
Front-End
System Synthesis
Front-End
Software / Hardware
Synthesis
Back-End
Software / Hardware
Synthesis
Back-End
TLM
Instruction-Set
Simulators
SLDL RTL
SW binary code HW HDL
Application
System Models
TLMTLMTLMn
Architecture
Electronic System-Level (ESL) Design
SystemC TLM 2.0,
CoWare
Mentor Catapult,
Forte, …
VaST,
ARM Realview,…
Green Hills,
gcc, VxWorks, …
VHDL/Verilog,
Synopsys, …
SPIRIT/IP-XACT
(XML)
MARTE (UML)
Tensilica
Matlab/Simulink,
Ptolemy, …
SCE, Metropolis, …
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 4
System-On-Chip Environment (SCE)
ArchnArchnTLMn
Impln
Spec
ImplnImpln
Synthesize target HW/SW
Compile onto MPSoC platform
Bridge
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 5
Abstract Programming Model
• Hierarchical process graph
• Sequential processes– ANSI C code
• Parallel-serial composition– Dependencies, Fork-join
• Abstract inter-process communication
• Communication channels– Message-passing, queues, etc.
• Shared variables
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 6
System Transaction-Level Model (TLM)
� Generate MPSoC TLM simulation model
� Fast and accurate for exploration and verification
• Compile onto multi-core/multi-processor platform
• Implement computation on processors/cores and busses
• Generate code for communication over bus network
Mem
IPHWBridge
CPU Bus DSP Bus
B3v1v2
B5B4
DSP
C4C2C1
OS + Drv
CPU
OS + Drv
Coren CorenCoren Coren
Core1 Coren
B2B1
C3
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 7
System Implementation
• Synthesize hardware and software for each processor
• High-level/behavioral RTL and interface synthesis– Allocation, scheduling, binding
• Software synthesis and RTOS targeting– Code generation, firmware synthesis, cross-compile & link
CPU
IPHW
Bridge
Arbiter
DSP
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 8
Lecture 8: Outline
� System-on-Chip Environment (SCE)
�Abstract programming model
�Transaction-level system models
�Hardware/software implementation
• Design examples
• Baseband platform
• MP3 decoder
• Cellphone
• Commercialization
• Specify-Explore-Refine (SER) tools
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 9
Design Example: Platform Architecture
DCT
(IP)
TX
CF
(ColdFire2)
M1
(SRAM)
M1Ctrl
I/O4
(HW)
CoPro
(Custom)
DSP
(DSP56k)
MBUS
BUS1 (MBus) BUS2 (DSP)
S
SS
S
M
S
M2/S
M1M
Arbiter1
Arb
IP Bridge
M
S
DCTBus
S
I/O3
(HW)
S
I/O2
(HW)
S
I/O1
(HW)
S
S
DMA
(2753A)
• Components• Processors
• Memories
• IPs
• Custom HW
• Connectivity• Buses
• Bridges
• Ports
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 10
Design Example: Application, Services
DCT
TX
CF
M1Ctrl
I/O4
HW
DSP
MBUS
BUS1 (MBus) BUS2 (DSP)
Arbiter1
IP Bridge
DCTBus
I/O3I/O2I/O1
DMA
M1
Enc Dec15 10
RTOS
Jpeg
Codebk
v1v2
SI BO BI SO
DCT
v3
• Computation • Processes (hierarchical C)
• Scheduling (static/OS)
Application
Test code
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 11
Design Example: Application, Services
DCT
TX
CF
M1Ctrl
I/O4
HW
DSP
MBUS
BUS1 (MBus) BUS2 (DSP)
Arbiter1
IP Bridge
DCTBus
I/O3I/O2I/O1
DMA
M1
Enc DecJpeg
Codebk
v1v2
SI BO BI SO
DCT
v3
C2
C1
C3
C4
C5
C6
C7
C8
C0
0x40
0x48
int0
0x14
0x80
0x10,int1 0xA000,intD
0x0C00,intC
0x0C50,intC
0x0800,intA
0x0900,intB
0x0950
0x0850
• Computation • Processes (hierarchical C)
• Scheduling (static/OS)
• Communication• High-level IPC (channels)
• Storage (variables)
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 12
Baseband Example: Specification Model
ControlPixel
Storage
Vocoder
HandleData
Quantization
EncodeStripeReceiveData
JPEGEncode
HfmDc HfmDC
DefaultHuffman
JPEGStart
JPEGHeader
JPEGInit
mduHigh = 1..Height
mduWide = 1..Width
DCEHuffACEHuff
DCT
Huffman
JPEGEnd
JPEG
stripe[]
Voice
Radio
SpchIn
SerOut
Pre_Process
Subframes
Coder
Lp_Analysis Open_Loop
Closed_Loop
Codebook
Update
Post_Process
Ctrl
Radio
Voice
SerIn
SpchOut
Pre_Decoder
Decoder
Decode_Lsp
Decoder_
Subframe
Post_Decoder
Post_Filter
inframe[160]
outparm[244]
inparm[244]
outframe[160]
txdtx_ctrl
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 13
RcvData
Co-process
SpchOut
DCT
inframe
int stripe[];
Decoder
JPEG
imgParm
Coder
OSModel
SerOut
SpchIn
SerIninparm
indata
outdata
DSPCF_OS
Mem
DMA
HW
DCT_IP
BI
BO
SO
SICtrl
CF
DSP_OS
DCTAdapter
outframe
outparm
Vocoder
Baseband Example: Architecture Model
� Partitioning, synchronization, message-passing, RPC
� Scheduling, task refinement, OS model insertion
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 14
Baseband Example: Network Model
� Data conversion, channel merging
� CE insertion, packeting, routing
CF_OS
ColdFire
DMA
DMA_HW
DCT
DCT_IP
M
S
DSP_OS
DSP
OSModel
HW
SI
SI_HW
BI
BI_HW
SO
SO_HW
BO
BO_HW
Tx
MS
linkTx1
l
DCTAdapter
linkDMA
linkTx2
linkBI
linkHW
linkSI
linkBO
linkSO
Mem
char[]
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 15
Baseband Example: TLM
� Synchronization, addressing, media acces
� Arbitration, data slicing, interrupt handling
CF_OS
CF_HAL
ColdFire
DMA
DMA_HW
Mem
Mem_HW
DCT
DCT_IP
MBProtocol
T_HW
DSP_OS
DSP_HAL
DSP
OSModel
HWHW_HW
SI
SI_HW
BI
BI_HW
SO
SO_HW
BO
BO_HW
Tx
dspProtocol
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 16
Baseband Example: Pin-Accurate Model
CF_BF
ISR
l
PIC
CF_OS
CF_HAL
CF_HW
ADDR ARM7
DMA
DMA_BF
ADDR
ADDR
Mem
Mem_BF
ADDR
DCT
DCT_IP
Arbiter
T_BF
DSP_BF
PIC
DSP_OS
DSP_HAL DSP_HW
DSP
l
ADDR
OSModel
ISR
HW
ADDR
HW_BF
SI
SI_BF
BI
ADDR,POLL_ADDR
BI_BF
SO
SO_BF
BO
BO_BF
Bridge
ADDR
ADDR
ADDR,
POLL_ADDR
ADDR,
POLL_ADDR
ADDR,
POLL_ADDR
� Implementation synthesis in backend tools
� Interface and high-level synthesis on hardware side
� Firmware, RTOS and C synthesis on the software side
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 17
MP3 Decoder Example
Specification
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
slave2
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Arm/In Arm/Out
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
slave2
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Arm/In Arm/Out
SW
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
IPBus
RIMDCT (IP)
LIMDCT (IP)
slave2
slave3
Master
Slave
Slave
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Imdct
Imdct
TX
Arm/LImdct
Arm/RImdct
Arm/In Arm/Out
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
IPBus
RIMDCT (IP)
LIMDCT (IP)
slave2
slave3
Master
Slave
Slave
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Imdct
Imdct
TX
Arm/LImdct
Arm/RImdct
Arm/In Arm/Out
HWSW2
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
IPBus1
IPBus2
RIMDCT (IP) RDCT (IP)
LIMDCT (IP) LDCT (IP)
slave2 &
master1
slave4
slave3
Master
Master
Slave
Slave
Slave
Slave
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Synth_frame
Dct32
Dct32Imdct
Imdct
Out/LDct
TX1
TX2
Arm/LImdct
Out/RDct
Arm/RImdct
Arm/In Arm/Out
MB (MicroBlaze)
MainBus (OPB)
master0
Arbiter
MP3_IN PCM_OUT
slave1
IPBus1
IPBus2
RIMDCT (IP) RDCT (IP)
LIMDCT (IP) LDCT (IP)
slave2 &
master1
slave4
slave3
Master
Master
Slave
Slave
Slave
Slave
Mad_Decoder
Mad_Stimulus
OS (round-robin)
stream_in pcm_out
Mad_Monitor
Mad_Error
Synth_frame
Dct32
Dct32Imdct
Imdct
Out/LDct
TX1
TX2
Arm/LImdct
Out/RDct
Arm/RImdct
Arm/In Arm/Out
HWSW
SCE
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 18
MP3 Decoder Exploration Results
• Single specification of application algorithm
• Fixed-point MP3 decoding algorithm [libmad]
• Explored several architectural alternatives
• Pure SW solution
• Custom HW or IPs for DCT/IMDCT acceleration
• Parallelizing and pipelining of SW and HW/IP
� Automatically generate models and simulate each
� Explore and validate all alternatives in < 1 hour
� Ready to synthesize to FPGA prototyping board
0
5
10
15
20
25
30
35
40
45
SW HWSW1 HWSW2 HWSW3 HWSW4 HWSW
Frame delay [ms]
IP
HW
Constraint
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 19
Cellphone Example
• 2 Subsystems
• ARM7TDMI
• MP3 Decoding
• Jpeg Encoding
• Motorola
DSP 56600k
• GSM Transcoding
• 4 Accelerator
HW blocks
• 10 I/O HW blocks
• 5 Busses
• AMBA AHB
• DSP Port A bus
• 3 Custom busses
Custom HWDSP 5660k
Encoder
Decoder
INTDINTCINTB
Codebook
search
Cust. HWCust. HWCust. HW Cust. HW
Enc.
Input
Enc.
Output
Dec.
Input
Dec.
Output
DSP Port A
INTA
ARM7TDMI
MP3
JPEG IRQFIQ
Control
AMBA AHB
PIC
Cust. HWCust. HW Cust. HW
MP3
IN
BMP
IN
JPEG
Output
Trans-
ducer
Cust. HW
Key-
board
Cust. HW
Display
Cust. HW
Left
SYNTH
Cust. HW
Right
SYNTH
Cust.
HW
MAD
Ouput
Cust. HW
PCM
OUT
DHS
INT0
INT31...
TimerDHS
DHS
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 20
Cellphone Example Results
• Experimental Results
• 1.5 second MP3
• 640x480 picture
• 1.5 speech GSM
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Average Error [%]
1
10
100
1000
10000
100000
Simulation Time [s]Avg. Error
Sim. Time
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Average Error [%]
1
10
100
1000
10000
100000
Simulation Time [s]Avg. Error
Sim. Time
• Speedup (vs. multi-ISS)
• 150x (HW/SW system)
• 600x (pure SW)
• Excellent accuracy
• <3% error
� Heterogeneous multi-processor platform TLM
Spec Arch Net TLM PAM ISS
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 21
SCE Exploration Results
• Suite of industrial size examples
9.49s30072225702193616441
ARM→4 I/O, 2 DCT, T
LDCT, RDCT→I/O
DSP→HW, 4 I/O, T
A1Cellphone
8.99s21711176851702011481
DSP→HW, 4 I/O, T
CF, DMA→Mem, BR, T, DMA
BR→DCT_IP
A1Baseband
7.40s244711907918748
ARM→2 I/O, LDCT, RDCT
LDCT↔I/O
RDCT↔I/O
A3
7.01s232281856418300ARM→2 I/O, LDCT, RDCTA2
3.85s215931727017131
13363
ARM→2 I/OA1
Mp3fix
17.69s327953020228736
CF →HW1, HW2, HW3, HW4
HW1↔HW3↔HW5
HW2↔HW4↔HW5
A3
6.18s311722863328275
CF→HW1,HW2,HW3
HW1↔HW3
HW2↔HW3
A2
5.86s298072820428190
6900
CF→HW1A1
Mp3float
5.76s1104199499659DSP→HW1,HW2,HW3A3
5.21s1098999139632DSP→HW1,HW2A2
4.77s1067997759594
7385
DSP→HWA1
Vocoder
1.01 s4642278027321806CF→HWA1JPEG
PAMNetArchSpec
Generation
time
Model size (LOC)System Architecture
(masters → slaves)Example
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 22
Lecture 8: Outline
� System-on-Chip Environment (SCE)
�Abstract programming model
�Transaction-level system models
�Hardware/software implementation
� Design examples
�Baseband platform
�MP3 decoder
�Cellphone
• Commercialization
• Specify-Explore-Refine (SER) tools
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 23
Source: InterDesign Technologies, Inc. / Japanese Aerospace Exploration Agency (JAXA)
ELEGANT Environment
Specification modelsimulation
Exploration and
RefinementFormal verification
TLMCo-simulation
High-levelsynthesis
Cycle-accurateCo-simulation
TLMTLM
Specification modelSpecification model
Softwaresource code
Softwaresource code Hardware RTLHardware RTL
Software synthesis
ELEGANT : Electronic Design Guidance Tool for Space Use
Assertion/Property checking
SERUCI
CyberWorkBenchNEC
VenusFujitsu
VisualSpecInterDesign Technologies
PE / CE / Bus database
PE / CE / Bus database
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 24
ELEGANT SpaceWire Evaluation
• SpaceWire: aerospace communication protocol standard
• High-speed and high-reliability interconnection network
– Asynchronous, fault-tolerance, topology agnostic
• Automated SpaceWire design with ELEGANT tool set
• From top-level specification model down to HW/SW
– HW/SW partitioning and exploration of the architecture with SER
– Synthesis down to SpaceCube prototyping platform
SpWSpecification Model
SpWSpecification Model
ELEGANT
SpW implemented in HW and SW
SpW implemented in HW and SW
SpW implemented in HW
SpW implemented in HW
Sp W
SpW
Micro computer SpaceCube
Share/reuse SpW design with high-level descriptions
Share/reuse SpW design with high-level descriptions
HW/SW partitioning and synthesis
HW/SW partitioning and synthesis
Evaluate HW/SW tradeoffs before implementation
Evaluate HW/SW tradeoffs before implementation
Source: Japanese Aerospace Exploration Agency (JAXA)
EE382V: Embedded Sys Dsgn and Modeling, Lecture 8 © 2008 A. Gerstlauer 25
Lecture 8: Summary
• System-on-Chip Environment (SCE)
• From specification to implementation
• Design examples
• Industrial-size examples
• SER commercialization
• Derivative of SCE
• Integrated with tools for simulation, HW synthesis
• Deployed in, e.g. NEC Toshiba Space Systems