Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Design & Co design of Embedded SystemsDesign & Co-design of Embedded Systems
Lecture 16:
Target ArchitecturesSharif University of Technology
Computer Engineering Dept.
Winter-Spring 2008
Mehdi Modarressi
Introduction
Up to now, we have investigated Co-designUp to now, we have investigated Co design methodology for embedded systems
Design specificationSystemC as a specification languageCo-synthesis algorithms
Design methodology:A set of tasks that transform a model to an architecture
Introduction
In this sessionIn this sessionProcessing and memory architectures
Next sessions:Next sessions:Models of communicationCommunication infrastructuresCommunication infrastructuresSystem-on-Chip (SoC) architecturesNetwork-on-Chips (NoC)Network on Chips (NoC)System-on-Programmable-Chips (SoPC)
Acknowledgement
Some slides are taken from the lecture notesSome slides are taken from the lecture notes of the following courses:
SoC design, Sharif Tech., 2007-8.SoC design, Sharif Tech., 2007 8.Hw/Sw co-design, ETHZ, 2006-7.SoC design, KTH, 2007-28. g , ,
General-Purpose Embedded Processors
Programmable via instruction-setProgrammable via instruction setPower consumption: By far less than general-purpose processorspurpose processors
Pentium 4 = ~100 w peak power, 30-60w under an ordinary workload.y
General-Purpose Embedded Processors
Embedded Processors:Embedded Processors:ARM PowerPC (IBM Motorola Apple)PowerPC , (IBM, Motorola, Apple)LEON (based on Sun Spark)Nios (Altera)Nios (Altera)Crusoe (Transmeta)….
Embedded Processors - Power
Power Consumption is a primaryPower Consumption is a primary consideration in embedded processorsPower Reduction can be achieved by aPower Reduction can be achieved by a variety of methods:
Circuit-level methodsCircuit level methodsArchitecture-level methodsSystem-level methodsSystem level methodsAlgorithm-level (software) methods
System-Level Power Reduction
Dynamic power managementDynamic power managementShutting down the unused system components
Dynamic frequency/voltage scalingTuning the working frequency and voltageTuning the working frequency and voltage based on the system utilization
System-level Power Reduction Methods: Dynamic Po er M n gmentPower Managment
Shutting down the processor when unused (or unused components)
Example: StrongARM 1100Example: StrongARM 1100Three modes of operation:
RUNSTDBY (just monitoring interrupts)STDBY (just monitoring interrupts)SLEEP
An external manager controls the inter-mode transitions Based on observing the workload and make transition decisions according to a policyaccording to a policy
System-level Power Reduction Methods: Dynamic Volt ge/Freq en S lingVoltage/Frequency Scaling
P α V2dd×f
Scaling operating voltage and frequency based on the current workload and power supply status.
Transmeta’s CrusoeFrequency can change in steps of 33 MHz and the voltage in steps of 25 mV.Automatic change by monitoring the processor utilization.
AMD’s PowerNow!Supports 32 different voltagesSupports 32 different voltages.
Intel’s SpeedStepThe earliest solution.Supports 2 voltages.
Intel’s XScale Power managementAn ARM-based architecture.The frequency can be changed by writing values in a register.allows 16 different clock settingsallows 16 different clock settings.
ASIPs (Application-specific Instruction-Set Processors)
Like general purpose processorsLike general purpose processors, programmable via instruction-set The instruction-set is specialize for a family ofThe instruction-set is specialize for a family of applications
DSP (Digital Signal Processors)DSP (Digital Signal Processors)NP (Network Processors)
µP (Microcontrollers) can be considered asµP (Microcontrollers) can be considered as an ASIP.
ASIPs (Application-specific Instruction-Set Processors)
Instruction sets can be adapted to the application.
New instructions can provide compound sets of existing operations, such as multiply–accumulate.Instructions can supply new operations such as primitivesInstructions can supply new operations, such as primitives for signal coding or block motion estimation.Instructions that operate on nonstandard operand sizes
ASIPs: Digital Signal Processors
Hardware implementation of The multiply–accumulate instruction dest = src1*src2 + src3,
A common operation in digital signal processingFast floating-point and trigonometric function evaluation
The AT&T DSP-16 was the first DSP.Modern DSP example: Texas Instruments (TI) DSPs
C55 f ilC55x familyC62x family
The C55x provides three co-processors for use in image processing and video compression:and video compression:
Pixel interpolationMotion estimationDCT/IDCT computation
ASIPs: Digital Signal Processors
Can be used for applications involving digital signalCan be used for applications involving digital signal processing
MultimediaTelecommunicationImage and speech processing…
The codes should be optimized to exploit the processor featuresp
My experience: a modem program (v.22 + v42+ v44 ) on TMS320c6x: 4 times faster than Pentium III 800Less powerLess power
Microcontrollers
The most prevalent processing element inThe most prevalent processing element in embedded systems
Optimized for control-based applicationsInterrupt-based Not computation intensive
The first actual SoCLower cost: one part replaces many partsLower cost: one part replaces many partsMore reliable: fewer on-board interconnectsFaster: signals stay on chip
Microcontrollers
Microcontrollers are available in 4 to 32-bit word sizesThe components of 8051, a typical microcontroller:
� 8-bit CPU with registers A (accumulator) and Bg ( )� 16-bit program counter (PC) and data pointer (DPTR)� 8-bit program status word (PSW) and stack pointer (SP)� Internal ROM: 4KB EPROM� Internal ROM: 4KB EPROM� Internal RAM of 128 bytes� 32 I/O pins organized as four 8-bit ports P0-P3� Two 16-bit timer/counters: T0 and T1� Two 16 bit timer/counters: T0 and T1� Full duplex serial data receiver/transmitter: SBUF� Control Registers� Two external and three internal interrupt sources� Two external and three internal interrupt sources
Microcontrollers
ATMEL’s AVRATMEL s AVRMicrochip’s PIC8086-based family micro.sy68xxx-based family micro.s….
High-performance Processors
Can be classified as an ASIPCan be classified as an ASIPFor high performance application
Graphic processingGraphic processingHigh-volume data processing
IBM CELLIBM CELL In Play Station 3 (PS3)Two PowerPC processors + 8 vector processingTwo PowerPC processors + 8 vector processing unitsUp to 256 GFLOPsUp to 256 GFLOPs
ASICs: Application-Specific ICs
An ASIC (Application Specific IntegratedAn ASIC (Application Specific Integrated Circuit) is an integrated circuit for a specific applicationppThe best power and performance resultsButBut
Less flexibilityMore NRE cost and time-to-marketMore NRE cost and time-to-market
Full-Custom ASIC
An engineer designs some or all of the logic cells,An engineer designs some or all of the logic cells, circuits, layout
Excellent performance, small size, low powerLong design cycleHigh NRE cost
When high-performance and low-power is needed
Standard-Cell ASICs
Use pre-designed Logic cells (known as standard cells): Simple logics, e.g. FFsLarger cells, called megacells or cores: e.g. FFT
Th t d d ll lib d fi l i l t f iThe standard cell library defines logic elements of varying complexity:
SSI, MSI logic, data blocks, memories and data-path.Used by Synthesis toolUsed by Synthesis tool
Standard cells are built by someone else using full custom design techniquestechniquesDesigners save time, money, and reduce risk by using a predesigned, pretested cell libraryCustom blocks can be embeddedCustom blocks can be embedded.
Gate-Array ASICs
A gate array chip contains prefabricated adjacentA gate array chip contains prefabricated adjacent rows of PMOS and NMOS transistorsThe gate array is configured by the interconnect structure
Interconnect is defined by designer and fabricated using a custom maska custom mask
Difficult layout
Reconfigurable Devices
None of the layers is customizedNone of the layers is customizedBasic logic cells and interconnect can be programmedCPLD and FPGABasic cells can be SRAM based, Flash Memory based or fuse-based (one time programmable)Altera and Xilinx
Memory Systems
No enough time for an overview on differentNo enough time for an overview on different memory technologies!
Only focus on cache and scratch pad systemssystems
Cache
The speed of processors improves by at leastThe speed of processors improves by at least 50% every yearThe speed of memories improves by 7% onlyThe speed of memories improves by 7% only
C h i d t l thiCache memory is used to close this gap
Scratch Pad
Cache problemsCache problemsPower consumption of cache look-upPredictability problems in real-time systemsPredictability problems in real time systems
Different program execution times based on the cache behavior
Scratch pad:A memory module like cachesCaches are transparent to the programmer while scratch pads are not.
Scratch Pad
It is mapped into the memory address spaceIt is mapped into the memory address space
Frequently used variables and selected dataFrequently used variables and selected data items should be allocated to that address space by programmerspace by programmer