36
Design & Co design of Embedded Systems Design & Co-design of Embedded Systems Lecture 16: Target Architectures Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi

Lecture 16: Target Architectures - ce.sharif.educe.sharif.edu/courses/86-87/2/ce333/resources/root/Lecture Notes/L16-Target... · Power Consumption is a primaryPower Consumption is

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Design & Co design of Embedded SystemsDesign & Co-design of Embedded Systems

Lecture 16:

Target ArchitecturesSharif University of Technology

Computer Engineering Dept.

Winter-Spring 2008

Mehdi Modarressi

Introduction

Up to now, we have investigated Co-designUp to now, we have investigated Co design methodology for embedded systems

Design specificationSystemC as a specification languageCo-synthesis algorithms

Design methodology:A set of tasks that transform a model to an architecture

Introduction

In this sessionIn this sessionProcessing and memory architectures

Next sessions:Next sessions:Models of communicationCommunication infrastructuresCommunication infrastructuresSystem-on-Chip (SoC) architecturesNetwork-on-Chips (NoC)Network on Chips (NoC)System-on-Programmable-Chips (SoPC)

Acknowledgement

Some slides are taken from the lecture notesSome slides are taken from the lecture notes of the following courses:

SoC design, Sharif Tech., 2007-8.SoC design, Sharif Tech., 2007 8.Hw/Sw co-design, ETHZ, 2006-7.SoC design, KTH, 2007-28. g , ,

Outline

Processing ElementsProcessing ElementsMemory Systems

Processing Elements

Processing Elements – Power and Energy

PowerPowerEnergyEnergy EfficiencyEnergy Efficiency

General-Purpose Embedded Processors

Programmable via instruction-setProgrammable via instruction setPower consumption: By far less than general-purpose processorspurpose processors

Pentium 4 = ~100 w peak power, 30-60w under an ordinary workload.y

General-Purpose Embedded Processors

Embedded Processors:Embedded Processors:ARM PowerPC (IBM Motorola Apple)PowerPC , (IBM, Motorola, Apple)LEON (based on Sun Spark)Nios (Altera)Nios (Altera)Crusoe (Transmeta)….

Embedded Processors - Power

Power Consumption is a primaryPower Consumption is a primary consideration in embedded processorsPower Reduction can be achieved by aPower Reduction can be achieved by a variety of methods:

Circuit-level methodsCircuit level methodsArchitecture-level methodsSystem-level methodsSystem level methodsAlgorithm-level (software) methods

System-Level Power Reduction

Dynamic power managementDynamic power managementShutting down the unused system components

Dynamic frequency/voltage scalingTuning the working frequency and voltageTuning the working frequency and voltage based on the system utilization

System-level Power Reduction Methods: Dynamic Po er M n gmentPower Managment

Shutting down the processor when unused (or unused components)

Example: StrongARM 1100Example: StrongARM 1100Three modes of operation:

RUNSTDBY (just monitoring interrupts)STDBY (just monitoring interrupts)SLEEP

An external manager controls the inter-mode transitions Based on observing the workload and make transition decisions according to a policyaccording to a policy

System-level Power Reduction Methods: Dynamic Volt ge/Freq en S lingVoltage/Frequency Scaling

P α V2dd×f

Scaling operating voltage and frequency based on the current workload and power supply status.

Transmeta’s CrusoeFrequency can change in steps of 33 MHz and the voltage in steps of 25 mV.Automatic change by monitoring the processor utilization.

AMD’s PowerNow!Supports 32 different voltagesSupports 32 different voltages.

Intel’s SpeedStepThe earliest solution.Supports 2 voltages.

Intel’s XScale Power managementAn ARM-based architecture.The frequency can be changed by writing values in a register.allows 16 different clock settingsallows 16 different clock settings.

ASIPs (Application-specific Instruction-Set Processors)

Like general purpose processorsLike general purpose processors, programmable via instruction-set The instruction-set is specialize for a family ofThe instruction-set is specialize for a family of applications

DSP (Digital Signal Processors)DSP (Digital Signal Processors)NP (Network Processors)

µP (Microcontrollers) can be considered asµP (Microcontrollers) can be considered as an ASIP.

ASIPs (Application-specific Instruction-Set Processors)

Instruction sets can be adapted to the application.

New instructions can provide compound sets of existing operations, such as multiply–accumulate.Instructions can supply new operations such as primitivesInstructions can supply new operations, such as primitives for signal coding or block motion estimation.Instructions that operate on nonstandard operand sizes

ASIPs: Digital Signal Processors

Hardware implementation of The multiply–accumulate instruction dest = src1*src2 + src3,

A common operation in digital signal processingFast floating-point and trigonometric function evaluation

The AT&T DSP-16 was the first DSP.Modern DSP example: Texas Instruments (TI) DSPs

C55 f ilC55x familyC62x family

The C55x provides three co-processors for use in image processing and video compression:and video compression:

Pixel interpolationMotion estimationDCT/IDCT computation

ASIPs: Digital Signal Processors

Can be used for applications involving digital signalCan be used for applications involving digital signal processing

MultimediaTelecommunicationImage and speech processing…

The codes should be optimized to exploit the processor featuresp

My experience: a modem program (v.22 + v42+ v44 ) on TMS320c6x: 4 times faster than Pentium III 800Less powerLess power

Microcontrollers

The most prevalent processing element inThe most prevalent processing element in embedded systems

Optimized for control-based applicationsInterrupt-based Not computation intensive

The first actual SoCLower cost: one part replaces many partsLower cost: one part replaces many partsMore reliable: fewer on-board interconnectsFaster: signals stay on chip

Microcontrollers

Microcontrollers are available in 4 to 32-bit word sizesThe components of 8051, a typical microcontroller:

� 8-bit CPU with registers A (accumulator) and Bg ( )� 16-bit program counter (PC) and data pointer (DPTR)� 8-bit program status word (PSW) and stack pointer (SP)� Internal ROM: 4KB EPROM� Internal ROM: 4KB EPROM� Internal RAM of 128 bytes� 32 I/O pins organized as four 8-bit ports P0-P3� Two 16-bit timer/counters: T0 and T1� Two 16 bit timer/counters: T0 and T1� Full duplex serial data receiver/transmitter: SBUF� Control Registers� Two external and three internal interrupt sources� Two external and three internal interrupt sources

Microcontrollers

ATMEL’s AVRATMEL s AVRMicrochip’s PIC8086-based family micro.sy68xxx-based family micro.s….

High-performance Processors

Can be classified as an ASIPCan be classified as an ASIPFor high performance application

Graphic processingGraphic processingHigh-volume data processing

IBM CELLIBM CELL In Play Station 3 (PS3)Two PowerPC processors + 8 vector processingTwo PowerPC processors + 8 vector processing unitsUp to 256 GFLOPsUp to 256 GFLOPs

ASICs: Application-Specific ICs

An ASIC (Application Specific IntegratedAn ASIC (Application Specific Integrated Circuit) is an integrated circuit for a specific applicationppThe best power and performance resultsButBut

Less flexibilityMore NRE cost and time-to-marketMore NRE cost and time-to-market

ASICs: Application-Specific ICs

Full-Custom ASIC

An engineer designs some or all of the logic cells,An engineer designs some or all of the logic cells, circuits, layout

Excellent performance, small size, low powerLong design cycleHigh NRE cost

When high-performance and low-power is needed

Standard-Cell ASICs

Use pre-designed Logic cells (known as standard cells): Simple logics, e.g. FFsLarger cells, called megacells or cores: e.g. FFT

Th t d d ll lib d fi l i l t f iThe standard cell library defines logic elements of varying complexity:

SSI, MSI logic, data blocks, memories and data-path.Used by Synthesis toolUsed by Synthesis tool

Standard cells are built by someone else using full custom design techniquestechniquesDesigners save time, money, and reduce risk by using a predesigned, pretested cell libraryCustom blocks can be embeddedCustom blocks can be embedded.

Standard-Cell ASICs

Gate-Array ASICs

A gate array chip contains prefabricated adjacentA gate array chip contains prefabricated adjacent rows of PMOS and NMOS transistorsThe gate array is configured by the interconnect structure

Interconnect is defined by designer and fabricated using a custom maska custom mask

Difficult layout

Reconfigurable Devices

None of the layers is customizedNone of the layers is customizedBasic logic cells and interconnect can be programmedCPLD and FPGABasic cells can be SRAM based, Flash Memory based or fuse-based (one time programmable)Altera and Xilinx

CPLD (Complex Prog. Logic Devices)

EPROM EEPROMEEPROM FLASH

FPGA

AntifuseSRAM

Outline

Processing ElementsProcessing ElementsMemory Systems

Memory Systems

No enough time for an overview on differentNo enough time for an overview on different memory technologies!

Only focus on cache and scratch pad systemssystems

Cache

The speed of processors improves by at leastThe speed of processors improves by at least 50% every yearThe speed of memories improves by 7% onlyThe speed of memories improves by 7% only

C h i d t l thiCache memory is used to close this gap

Scratch Pad

Cache problemsCache problemsPower consumption of cache look-upPredictability problems in real-time systemsPredictability problems in real time systems

Different program execution times based on the cache behavior

Scratch pad:A memory module like cachesCaches are transparent to the programmer while scratch pads are not.

Scratch Pad

It is mapped into the memory address spaceIt is mapped into the memory address space

Frequently used variables and selected dataFrequently used variables and selected data items should be allocated to that address space by programmerspace by programmer

Scratch Pad vs. Cache

Using scratch pads is harder than cachesUsing scratch pads is harder than caches

Scratch pad is predictableScratch pad is predictableScratch pad consumes less power