33
ASIC Multimedia Chips ASIC Multimedia Chips and a Short Review of and a Short Review of Section for Low Power Section for Low Power Multimedia in ISSCC 2006 Multimedia in ISSCC 2006 Mentor: Dr. Fakhraii Mentor: Dr. Fakhraii By: Masoud Rostami, By: Masoud Rostami,

ASIC Multimedia Chips and a Short Review of Section for Low Power Multimedia in ISSCC 2006 Mentor: Dr. Fakhraii By: Masoud Rostami,

Embed Size (px)

Citation preview

ASIC Multimedia Chips and a ASIC Multimedia Chips and a Short Review of Section for Low Short Review of Section for Low

Power Multimedia in ISSCC Power Multimedia in ISSCC 20062006

Mentor: Dr. FakhraiiMentor: Dr. Fakhraii

By: Masoud Rostami,By: Masoud Rostami,

AgendaAgenda

1.1. PRAMsPRAMs

2.2. Multimedia ASIC ChipsMultimedia ASIC Chips

3.3. Multimedia Processors in ISSCC06Multimedia Processors in ISSCC06

4.4. SummerySummery

5.5. ReferencesReferences

Phase-Change MemoryPhase-Change MemoryPRAM (Phase-Change Random Access Memory) is attracting great interest PRAM (Phase-Change Random Access Memory) is attracting great interest as the candidate for the next generation of non-volatile memory devices. as the candidate for the next generation of non-volatile memory devices.

The cell material used in PRAM isThe cell material used in PRAM is a a Chalcogenide Chalcogenide alloy (Gealloy (Ge22SbSb22TeTe5 5 or GSTor GST(( which takes either low resistivity polycrystalline phase (SET State for ‘0’) or which takes either low resistivity polycrystalline phase (SET State for ‘0’) or high resistivity amorphous phase ( RESET state for ‘1’)high resistivity amorphous phase ( RESET state for ‘1’)

Conversion between two phases is realized by resistive heating. Conversion between two phases is realized by resistive heating.

To write a GST cell to RESET state, GST compound is heated above the To write a GST cell to RESET state, GST compound is heated above the melting point and quenched rapidly. To write a GST cell to SET state, GST melting point and quenched rapidly. To write a GST cell to SET state, GST is heated to a temperature between the crystallization and melting point, for is heated to a temperature between the crystallization and melting point, for a period of time which is long enough to crystallize the GST.a period of time which is long enough to crystallize the GST.

Note: Note: Chalcogenide is the same material utilized in re-writable optical media Chalcogenide is the same material utilized in re-writable optical media (such as CD-RW and DVD-RW).(such as CD-RW and DVD-RW).

A 0.1um 1.8V 256 Mb 66MHz A 0.1um 1.8V 256 Mb 66MHz Synchronous Burst PRAMSynchronous Burst PRAM

[2]

[1]

Multimedia ASIC ChipsMultimedia ASIC Chips

Due to rDue to rapidly Changing Standards and Technologies apidly Changing Standards and Technologies Adaptation to new standards is the key factor of SuccessAdaptation to new standards is the key factor of Success The life span of each HW is shorter and shorter.The life span of each HW is shorter and shorter. It might be a feast this year but a famine next year.It might be a feast this year but a famine next year. When a new processor released for new standard, it When a new processor released for new standard, it

makes huge profits.makes huge profits. The product (old processor) is out of date and a new The product (old processor) is out of date and a new

processor for latest standards is still under development.processor for latest standards is still under development.

Therefore, the Therefore, the flexibleflexible and and VersatileVersatile hardware is required. hardware is required.

ExampleExample

SAA7215, SAA7216, SAA7221, SAA7214 SAA7215, SAA7216, SAA7221, SAA7214

by Philips semiconductors (QFB208):by Philips semiconductors (QFB208):

It was announced in January 2001 and It was announced in January 2001 and it was it was discontinueddiscontinued in March 2002. in March 2002.

Solution: Configurable Video Solution: Configurable Video Processors Processors

Since the standards keep changing:Since the standards keep changing:– The solution might be powerful core DSP together withThe solution might be powerful core DSP together with Flexible Flexible

parts.parts.– Look for stable core and flexible components so thatLook for stable core and flexible components so that they can they can

survive during minor revisions of major revisions.survive during minor revisions of major revisions.– The ideal goal is to survive during Digital Video revisions. (DVB-The ideal goal is to survive during Digital Video revisions. (DVB-

S, DVT-T, DVB-S2, HDTV,…)S, DVT-T, DVB-S2, HDTV,…)

We should let consumer get access to any kind of peripheral We should let consumer get access to any kind of peripheral that is possible: Ethernet, USB, IDE, UART, IrDA,…that is possible: Ethernet, USB, IDE, UART, IrDA,…

It should support as much as standards and stream that is It should support as much as standards and stream that is possible.possible.

Philips: PNX8526Philips: PNX8526

[3]

Continued..Continued..

It was designed in 0.12 um It was designed in 0.12 um technology.technology.

TriMedia is a Philips internal TriMedia is a Philips internal microprocessor core with a microprocessor core with a proprietary architecture. It is a proprietary architecture. It is a VLIW with an instruction set VLIW with an instruction set optimized for digital media optimized for digital media processing. One implementation is processing. One implementation is the Philips-internal TM3270 the Philips-internal TM3270 synthesizable RTL core.synthesizable RTL core.Nexperia is a product line of chips Nexperia is a product line of chips based on a Trimedia processor based on a Trimedia processor with specific-application targeted with specific-application targeted peripherals. Nexperia chips have peripherals. Nexperia chips have part numbers beginning with PNX part numbers beginning with PNX

[3]

Philips ChipsPhilips Chips

PNX8526PNX8526:: analog/digital television chip including a 266 MHz MIPS analog/digital television chip including a 266 MHz MIPS CPU processor core and a 240 MHz TriMedia processor core CPU processor core and a 240 MHz TriMedia processor core supporting demux and decoding of SDTV MPEG-2 Main profile and supporting demux and decoding of SDTV MPEG-2 Main profile and Main level and HDTV MPEG-2 Main profile and High level, with Main level and HDTV MPEG-2 Main profile and High level, with scaling and de-interlacing up to 1920x1080 resolution at 60 scaling and de-interlacing up to 1920x1080 resolution at 60 interlaced fields/second or 1368x720 resolution at 60 progressive interlaced fields/second or 1368x720 resolution at 60 progressive scan frames per second. scan frames per second.

PNX010xPNX010x: portable audio and multimedia player chip based on : portable audio and multimedia player chip based on ARM7 or ARM9 processors with a NAND flash memory and hard ARM7 or ARM9 processors with a NAND flash memory and hard drive interfaces.drive interfaces.

PNX1500PNX1500: media processor based on the TriMedia TM2360 VLIW : media processor based on the TriMedia TM2360 VLIW processor core running at 300 MHz with an LCD display controller processor core running at 300 MHz with an LCD display controller and ethernet interface. and ethernet interface.

ContinuedContinued

PNX1700PNX1700: with features similar to the PNX1500 but based on : with features similar to the PNX1500 but based on the TriMedia TM5250 CPU core with software support for the TriMedia TM5250 CPU core with software support for H.264, MPEG-4 (SP, MVP, ASP), WMV9, DivX, and MPEG-2 H.264, MPEG-4 (SP, MVP, ASP), WMV9, DivX, and MPEG-2 with support for HDTV resolution decode of MPEG-2, WMV9, with support for HDTV resolution decode of MPEG-2, WMV9, and DivX (but not H.264). and DivX (but not H.264). PNX4103PNX4103: software programmable mobile multimedia : software programmable mobile multimedia processor, capable of H.264 (unspecified Profile) decode at processor, capable of H.264 (unspecified Profile) decode at D1 (SDTV) resolution with stacked DRAM and support for D1 (SDTV) resolution with stacked DRAM and support for direct and RAM-buffered display interfaces.direct and RAM-buffered display interfaces.PNX7100PNX7100:DVD recorder chip with MPEG-2 encoding and :DVD recorder chip with MPEG-2 encoding and decoding for interlaced video includes a MIPS Technologies decoding for interlaced video includes a MIPS Technologies 133 MHz MIPS32 system controller processor core133 MHz MIPS32 system controller processor core with with additional support for progressive scan video, fabbed in a additional support for progressive scan video, fabbed in a Philips 0.12 um process.Philips 0.12 um process.

othersothers

NEC:NEC:– uPD61126:uPD61126: MPEG-2 decoder supporting multiple streams at MPEG-2 decoder supporting multiple streams at

standard television resolution with noise filters and a range of standard television resolution with noise filters and a range of standard video interfaces based on 2 MIPS Technologies 4Kc standard video interfaces based on 2 MIPS Technologies 4Kc cores with enhanced security features for set-top boxes cores with enhanced security features for set-top boxes

BroadCom:BroadCom:– BCM2722:BCM2722: Video Core II Multimedia Processor, used in the Video Core II Multimedia Processor, used in the

Apple Video iPod, is capable of MPEG-4 video encode and Apple Video iPod, is capable of MPEG-4 video encode and decode with design for low power consumption for battery decode with design for low power consumption for battery powered devices. The package contains a stacked 32 megabit powered devices. The package contains a stacked 32 megabit SDRAM, a USB 1.1 slave interface, a camera interface for up to SDRAM, a USB 1.1 slave interface, a camera interface for up to 5M pixels, and an LCD controller interface among other 5M pixels, and an LCD controller interface among other interfaces. The BCM2722 is manufactured in a 0.13um process interfaces. The BCM2722 is manufactured in a 0.13um process technology.technology.

– BCM3560BCM3560::

Low PowerLow Power Multimedia Section of Multimedia Section of ISSCC06ISSCC06

With the availability of increasing data bandwidth, there is a With the availability of increasing data bandwidth, there is a greater demand for much more advanced multimedia greater demand for much more advanced multimedia processing capabilities, which in turn translates to higher processing capabilities, which in turn translates to higher computational and storage requirements on these devices. computational and storage requirements on these devices. Compounding this challenge is the ever increasing demand for Compounding this challenge is the ever increasing demand for mobility, dictating that these multimedia functions be performed mobility, dictating that these multimedia functions be performed at the lowest levels of power consumption.at the lowest levels of power consumption.

The seven papers in this session focus on recent advances in The seven papers in this session focus on recent advances in low power multimedia processing integrated circuits that deliver low power multimedia processing integrated circuits that deliver advanced functionality, such as 3D graphics, high resolution advanced functionality, such as 3D graphics, high resolution still and video encoding/decoding, and high fidelity audio still and video encoding/decoding, and high fidelity audio playback. Results from these papers demonstrate that smart playback. Results from these papers demonstrate that smart architecture design and implementation techniques, in architecture design and implementation techniques, in conjunction with advanced process technology, can deliver conjunction with advanced process technology, can deliver very high performance multimedia functionalities at very low very high performance multimedia functionalities at very low power consumption levels.power consumption levels.

6.33mW MPEG Audio Decoding on 6.33mW MPEG Audio Decoding on a Multimedia Processor a Multimedia Processor in 0.18u in 0.18u

TechnologyTechnology

Techniques to realize a Low Power Techniques to realize a Low Power Multimedia:Multimedia:– A parallel processing DSP for low voltage A parallel processing DSP for low voltage

operationoperation– Multi-Power DomainMulti-Power Domain– A conditional pre-charge FF.A conditional pre-charge FF.

Pipelining =>Low-frequency => Pipelining =>Low-frequency => Low-voltageLow-voltage

By making use of hardwired By making use of hardwired functional blocks and parallel functional blocks and parallel and pipelined processing, the and pipelined processing, the required operating frequency required operating frequency for MPEG decoding can be for MPEG decoding can be lowered to 30MHz. As a result, lowered to 30MHz. As a result, the voltage supply for MPEG the voltage supply for MPEG decoding can be reduced to decoding can be reduced to 1.1V from 1.8 and 1.3V, which 1.1V from 1.8 and 1.3V, which is especially effective at is especially effective at reducing the dynamic power reducing the dynamic power dissipation. The dynamic dissipation. The dynamic power is reduced by 62.7%.power is reduced by 62.7%.

[4]

Multi-Bus ArchitectureMulti-Bus Architecture

To obtain the high bandwidth data flow necessary for multimedia signal processing, a multiple-bus architecture is applied. The multiple-bus is comprised of one high-speed bus and 3 peripheral buses. The main bus connects data transfer extensive blocks, such as the hardwired dedicated DSP, memory card IF, USB2 PHY, etc. )288 MB/s.) The peripheral buses connect serial ports, timers, ADC, etc. )72 MB/s(. With this multi-bus architecture, high-capacity data can be effectively transferred without causing any conflicts with slow data. External memories are connected via an external memory controller.

A conditional pre-charge FFA conditional pre-charge FFIn this circuit structure, the clock signal (CLK) is gated by the input signal (D, Db) so that there are only a minimum number of node changes even if data changes as shown in Fig. 22.7.3. With a conventional flip-flop, a lot of nodes change uniformly when the clock signal is toggled, and as a result, large power is dissipated. Therefore the proposed conditional precharged flip-flop can reduce the dynamic power dissipation associated with the clock signal compared with the conventional flipflop.

[4]

A conditional pre-charge FFA conditional pre-charge FFthe power dissipation of the flip-flop consists of two parts:

1. the power dissipation owing to the transition of clock signal(CK)

2. the power dissipation owing to transition of the data signal (D)

[4]

A conditional pre-charge FFA conditional pre-charge FF

[4]

Multi-Power DomainMulti-Power Domain

This processor has a multi This processor has a multi power domain that is divided power domain that is divided into 6 parts. Each part is into 6 parts. Each part is connected to an individual connected to an individual 1.1V power supply that can be 1.1V power supply that can be turned off. For example, in the turned off. For example, in the case of AAC decoding, three case of AAC decoding, three power domains are turned off.power domains are turned off.

[4]

Chip MicrographChip Micrograph

[4]

A 5mW MPEG4 SP Encoder with 2D A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion Estimation Bandwidth- Sharing Motion Estimation

for Mobile Applicationsfor Mobile ApplicationsMPEG-4 codec designs [1-2] have been reported that address the low power requirements demanded by mobile devices.Three sources consume most of the power in an MPEG-4 encoder:– Motion estimation (ME) consumes more than a half of the total

power, in general, because of its high memory access requirements.

– Secondly, the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) consumes power because of complex computations.

– Data buffering between motion estimation/motion compensation (ME/MC) and quantization/variable length code (Q/VLC) consumes power because of the SRAM accesses.

System Architecture System Architecture

At the module level, the design focuses on ME and DCT designs to reduce power consumption. At the system level, the design reduces the amount of data buffering between Q/VLC.

[5]

DCT ArchitectureDCT ArchitectureMost DCT coefficients become zero after quantization, so the precision of these coefficients is less important. These can be calculated with less precision to save power, and ideally little drop in quality. A DCT design is adopted that depends on the content to decide the required precision. It consumes less power for lower-precision calculations reducing the total power consumption.

[5]

DCT & Zero Marker SchemeDCT & Zero Marker SchemeA classifier circuit decides the allocation of calculation resources. It is based on the value of the pixel-to-pixel amplitude (PPA) and the quantization parameter (QP). After classification, the number of calculation bits is decided. Both clock and combinational circuits are shut down for any unused additional bits. The quality degradation due to reduced precision is less than 0.1dB compared with a normal DCT.

A zero marker scheme is adopted to reduce the data access of the SRAM buffer between stages. The buffered data for VLC is quantized, and they are mostly zero. For every four entities stored in SRAM, a one bit register is used to record if they are all zeros. If this occurs, no reading and writing is required. This mechanism avoids most buffer accesses between the Q stage and VLC stage. It can save 86% of data buffering in low bit rate and 62% in high bit rate mode depending on the sequences

CharacteristicsCharacteristics

[5]

Die MicrographDie Micrograph

[5]

A 125μW, Fully Scalable MPEG-2 A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for and H.264/AVC Video Decoder for

Mobile ApplicationsMobile Applications

[6]

A 120Mvertices/s Multi-threaded A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile VLIW Vertex Processor for Mobile

Multimedia ApplicationsMultimedia Applications

[7]

A 120Mvertices/s Multi-threaded A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile VLIW Vertex Processor for Mobile

Multimedia ApplicationsMultimedia Applications

[7]

Summery Summery PRAMs seems to be a promising field for non-volatile memoriesPRAMs seems to be a promising field for non-volatile memories

To survive in multimedia ASIC industry, we must give to consumer To survive in multimedia ASIC industry, we must give to consumer some flexibility (configurability) and also some versatility.some flexibility (configurability) and also some versatility.

In ISSCC06, these techniques have been used for lowering the In ISSCC06, these techniques have been used for lowering the power consumption while not violating the performance (in power consumption while not violating the performance (in Multimedia Section):Multimedia Section): Multi-power domainsMulti-power domains parallel processing for low voltage operationsparallel processing for low voltage operations Conditional pre-charge DFFsConditional pre-charge DFFs Multi-threadingMulti-threading zero-marker schemezero-marker scheme precision-aware DCT/IDCT blockprecision-aware DCT/IDCT block … …

ReferencesReferences1.1. S. Kang, et al, “S. Kang, et al, “A 0.1um 1.8V 256 Mb 66MHz Synchronous Burst PRAMA 0.1um 1.8V 256 Mb 66MHz Synchronous Burst PRAM”, ”,

ISSCC2006ISSCC2006

2.2. H. R. Oh, et al, “Enhanced Write Performance of a 64Mb Phase-change Random H. R. Oh, et al, “Enhanced Write Performance of a 64Mb Phase-change Random Access Memory”, ISSCC2005Access Memory”, ISSCC2005

3.3. ““PNX8526 Datasheet”, Philips SemiconductorsPNX8526 Datasheet”, Philips Semiconductors

4.4. Y. Ueda, et al, “6.33mW MPEG Audio Decoding on a Multimedia Processor”, Y. Ueda, et al, “6.33mW MPEG Audio Decoding on a Multimedia Processor”, ISSCC2006ISSCC2006

5.5. C. P. Lin, et al, “A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion C. P. Lin, et al, “A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion Estimation for Mobile Applications”, ISSCC2006Estimation for Mobile Applications”, ISSCC2006

6.6. T. M. Llu, et al, “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder T. M. Llu, et al, “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications”, ISSCC2006for Mobile Applications”, ISSCC2006

7.7. C. H. Yu, et al, “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for C. H. Yu, et al, “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications”, ISSCC2006Mobile Multimedia Applications”, ISSCC2006

Thank You for Your AttentionThank You for Your Attention