Upload
vukhanh
View
215
Download
1
Embed Size (px)
Citation preview
1
September 4, 2002. Course 290A
MPEG decoder Case
K.A. VissersUC Berkeley
Chamleon Systems Inc.and
Pieter van der WolfPhilips Research
Eindhoven, The Netherlands
2
September 4, 2002. Course 290A
Outline
• Introduction• Consumer Electronics• Kahn Process Networks Revisited• MPEG application• Mapping• Design space exploration• Conclusions
3
September 4, 2002. Course 290A
Consumer Electronics
4
September 4, 2002. Course 290A
Consumer Electronics
• Cost (< $100 for electronics of TV)• Low power consumption (no fan / mobile)• Large series (> 0.5 million pieces)• Robustness (no hazards, no reboots)• Real-time constraints (no loss of data, guaranteed
response)• Both control and signal processing• Increasing requirements for computation &
communication (more functions, higher resolutions)• Wide range of information appliances• Large portfolio of semiconductor products
5
September 4, 2002. Course 290A
Consumer Electronics
Trend towards multi-functional multi-standard products– web browsing– e-mail– high-speed modem (xDSL)– teletext– EPG– graphics– MPEG-2– picture rate conversion (50-100 Hz)– video scaling– picture enhancement– etc.
6
September 4, 2002. Course 290A
Video Processing
Requirements for computation & communicationStandard resolution TV:• PAL: 720 x 576 @ 25 frames/sec• NTSC: 720 x 480 @ 30 frames/sec10.4 million samples/secluminance (y) and chrominance (u and v, subsampled)typically 8-10 bits for luminance, 8 bits for chrominance
Say in TV system 100 - 1000 operations / pixel:• 1 - 10 billion operations / sec• 1 - 10 Gbyte / sec internal bandwidthHigh definition TV (1920 X 1080 @ 30): factor 6 more
7
September 4, 2002. Course 290A
TM-xxxxD$
I$
TriMedia CPU
DEVICE I/P BLOCK
DEVICE I/P BLOCK
DEVICE I/P BLOCK
.
.
.
DVP System Silicon
VLIW Media Processor:• 100 to 300+ MHz• 32-bit or 64-bit
NexperiaSystem Busses• PI bus• Memory bus• 32-128 bit
PI B
US
SDRAM
MMI
DV
P M
EM
OR
Y B
US
DEVICE I/P BLOCK
PRxxxxD$
I$
MIPS CPU
DEVICE I/P BLOCK.
.
.DEVICE I/P BLOCK
PI B
US
General Purpose RISC Processor• 50 to 300+ MHz• 32-bit or 64-bitLibrary of Device Blocks• Image
coprocessors• DSPs• UART• 1394• USB
•…and more
TriMediaTMMIPSTM
Philips NexperiaTM
Flexible architecture for digital video applications
8
September 4, 2002. Course 290A
NexperiaTM
ScalabilityVLIW
SDRAM
MMI
TriMedia CPU + Device blocks whencontrol functions are
minimal
MIPS CPU +Trimedia CPU replacing some Device blocks
RISC
SDRAM
VLIWMMI
• Single architecture, multiple product configurations– Processor core options - TM32, TM64, MIPS32, MIPS64 ...– Device block options
• Highly programmable to weakly programmable
MIPS CPU +Device blocks +
Software
RISC
SDRAM
MMI
9
September 4, 2002. Course 290A
Design Problem
VLIWcpu
I$
video-in
video-out
audio-in
SDRAM
audio-out
PCI bridge
Serial I/O
timers MPEG
D$
10
September 4, 2002. Course 290A
MPEG2 decoding
Audio and Video,• Audio completely done in software on VLIW• Video:
– describe in Kahn Process Networks– Test for compliance with the standards:
• ranging from H263 to HDTV!
– Map on software + dedicated hardware
11
September 4, 2002. Course 290A
MPEG2 decoding status
IP content in Philips Semiconductors:NX8500 family: single chip HDTV decoder
(audio + Video):– dedicated MPEG2 video decoder– specialized Video Output– many other features
12
September 4, 2002. Course 290A
Methodology
In Research:• Describe the full functionality of the MPEG2
video decoding.• Mapping and performance analysis
Compare with the actual Semiconductors results in ICs
13
September 4, 2002. Course 290A
Steps
• Download the Berkeley MPEG decoder• Remove errors• Remove global variables• Split in separate processes
– natural partitioning– insert read, write and execute statements– profile and run
14
September 4, 2002. Course 290A
Application Modeling: Yapi
Y-api: An C++ api for Kahn Process Networks:Expose parallelism and communication
ProcessFIFO Read
Write
A C
B Execute
15
September 4, 2002. Course 290A
Workload analysis
Computation and Communication workload
FIFO
A C
B
Operation
A-up 56952
Count
A-down 1834
Operation
B-filter 58786
Count
Operation
C-join 34815
Count
C-pass 9834
C-skip 7274
58786 58786
34815
16
September 4, 2002. Course 290A
MPEG decoding
• Take a stream of encoded bits• Separate the headers and parse for parameters• Variable length decoding• On blocks of 16x 16 pixels do:
– Inverse scan quantization– Inverse Discrete Cosine Transform– Add to the motion compensated block
• Reorder images IPBB -> IBBP
17
September 4, 2002. Course 290A
Motion-compensation
n
n-1
n-1/2
picture number
-D/2
D/2
18
September 4, 2002. Course 290A
Mpeg2 video decoder
Model of MPEG2 video decoder in Yapi
19
September 4, 2002. Course 290A
Design Problem
VLIWcpu
I$
video-in
video-out
audio-in
SDRAM
audio-out
PCI bridge
Serial I/O
timers MPEG
D$
20
September 4, 2002. Course 290A
Mpeg2 mapping
Dedicated Mpeg coprocessor: pipe with fifo buffers and blocking semantics
Software on the VLIW
Video In
Video Out
Memory
Memory
Dedicated Mpeg coprocessor
Memory
21
September 4, 2002. Course 290A
Mpeg2 mapping for SD!
Dedicated Mpeg coprocessor: pipe with fifo buffers and blocking semantics
Software on the VLIW
Video In
Video Out
Memory
Memory
Software on the VLIW
Memory
22
September 4, 2002. Course 290A
Mapping and performance analysis
Process
FIFO
Read
Write
A C
BExecuteApplication
model (YAPI)
R(1)
W(3)E(C-join)
R(2)W(2)E(B-filter)
R(1)
X YAbstract
MemAbstract bus model
Architecturemodel processor
model
23
September 4, 2002. Course 290A
MPEG2 mapping
Data dependant: different video streams:• Computation on the VLIW and coprocessors• Communication on the bus and internally in
the Mpeg2 decoderExploration Questions:Bus load, burst size of the communication,latency of the memory interface
24
September 4, 2002. Course 290A
Exploration results: bus waits
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 10 20 30 40 50 60 70cycle
SU stalls for bus
"int_SU_o2.dat"
Storage Unit
25
September 4, 2002. Course 290A
0
500
1000
1500
2000
2500
0 10 20 30 40 50 60 70cycle
VInput stalls for bus
"int_VIn_o1.dat"
Exploration results: bus waits
Video In
26
September 4, 2002. Course 290A
Design Space Exploration
27
September 4, 2002. Course 290A
Lessons learned
• Models of the concurrency matter• Quantifying design choices is key• Match of Model of Computation and
Model of Architecture• Tools required: Mapping, Simulation and
Design Space Exploration support• Raise the level of abstraction for
exploration
28
September 4, 2002. Course 290A
References
• Pieter van der Wolf, Paul Lieverse, Mudit Goel, David La Hei, and Kees Vissers, "An MPEG-2 Decoder Case Study as a Driver for a System Level Design Methodology" In: Proc. 7th Int. Workshop on Hardware/Software Codesign (CODES'99), Rome, Italy, May 3-5 1999.
• E.A. de Kock, G. Essink, W.J.M. Smits, P. van der Wolf, J.-Y Brunel, W.M. Kruijtzer, P. Lieverse, K.A. Vissers. “YAPI: Application Modeling for Signal Processing Systems” In: Proc. 37th DAC, Los Angeles, June 2000.
• Http://ptolemy.eecs.berkeley.edu/~vissers