15
1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important than performance: Mobile communications Mobile computing Wireless Internet Medical implants Deep space applications Battery life time Trading area/performance for power Power can be reduced by decreasing the supply voltage and allowing the performance to degrade. Trading performance for power But these techniques incur an area penalty. Trading area for power Designing for Low Power: Approaches

Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

  • Upload
    haxuyen

  • View
    220

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

1

Lecture 12

Architectures for Low Power: Transmeta’s Crusoe Processor

Motivation

� Exponential performance increase at a low cost� However, for some application areas low power

consumption is more important than performance:�Mobile communications�Mobile computing�Wireless Internet�Medical implants�Deep space applications

� Battery life time

� Trading area/performance for power� Power can be reduced by decreasing the supply voltage and

allowing the performance to degrade.

� Trading performance for power

� But these techniques incur an area penalty.

� Trading area for power

Designing for Low Power:Approaches

Page 2: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

2

� Avoiding waste� Avoiding waste

� Clocking module when they are idle

� Glitching

� Using dedicated rather than programmable hardware

� Reducing control overhead by using regular algorithms and

architectures

� Designing systems to meet performance requirements

Designing for Low Power:Approaches

� Exploiting locality� Global operations inherently consume a lot of power.

� Data must be transferred from one part of the chip to another at the expense of switching large bus capacitances.

� A design partitioned to exploit locality of reference can

minimize the amount of expensive global communications

employed in favor of much less costly local interconnect

networks.

Designing for Low Power:Approaches

Crusoe Family of Processorsfrom Transmeta

�Introduction�Software: Code Morphing�Hardware: VLIW core�Performance�Applications

Page 3: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

3

The Idea – David Ditzel

�Champion of simple chip architecture.�1995

�Chief Technical Officer of Sun MicroSystems Inc.’sSparc Business.

�Working on emulation of x86 software on SparcProcessors.

The Idea – David Ditzel

�Early 1995 left Sun and worked on his own idea.�Was not happy with the complexity of the

architectures of recent times.�Some new ideas mixed with some old ideas to build

a simple and fast architecture capable of running x86 code.

�Software hardware hybrid.

The Company - Transmeta

�Ditzel and Colin Hunter choose the nameTransmeta and the company was formed in Summer of 1995.

�Use of contacts in the industry to recruit top brains for the ideas.

�Design started in the living rooms of the founders homes.

�Now employs many people.

Page 4: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

4

Innovation

�Transmeta Crusoe chip�x86 Emulation �Very Long Instruction Word (VLIW) �Code Morphing �Simple Architecture�LongRun Technology�Virtual Devices�Low Power

Introducing a Software Layer

Software:Code Morphing

�Performs dynamic binary translation. �Compiles instructions from one instruction set

architecture (ISA) to another ISA.

Page 5: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

5

Code Morphing

x86 binary codex86 binary code

Code Morphing Software

Code Morphing Software

VLIW binary codeVLIW binary code

Decoding and Scheduling

Code morphing translates an entire group of x86 instructions at once and stores the translation in a translation cache for future reference.

Conventional x86 superscalar processors fetch binary instructions and decode them into separate micro-operations. Then they are reordered by the hardware and executed in parallel.

Decoding and Scheduling

Page 6: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

6

Decoding and Scheduling

�The translation step introduces many opportunities.�Due to high repeat rates the translation cache is

frequently used to reduce overhead.�Can use much more sophisticated scheduling

algorithms.�Much lower power consumption because translation

is all in software.�Can optimize generated code, and by ‘learning’

which parts are executed often, can change levels of optimization dynamically.

Instruction Set Emulation

�Emulation is traditionally slow because of the way different ISAs handle condition codes and exceptions.

�Crusoe uses specific registers to emulate setting of condition codes by the processor

(.c suffix is used after the instruction to show that condition codes need to be set).

�Exceptions are handled by using shadow registers, and a procedure called “commit and rollback”

Translation

by code morphing software

Translation Step 1

Ld %r30, [%esp]

Add.c %eax, %eax, %r30

Ld %r31, [%esp]

Add.c %ebx, %ebx, %r31

Ld %esi, [%ebp]

Sub.c %ecx, %ecx, 5

Original x86 code Native VLIW code

Addl %eax, (%esp)

Addl %ebx, (%esp)

Movl %esi, (%ebp)

Subl %ecx, 5

Page 7: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

7

Optimisation

Elimination of atoms + extra condition code options.

Translation Step 2

Ld %r30, [%esp]

Add %eax, %eax, %r30

Add %ebx, %ebx, %r30

Ld %esi, [%ebp]

Sub.c %ecx, %ecx, 5

Optimized Native VLIW codeNative VLIW code

Ld %r30, [%esp]

Add.c %eax, %eax, %r30

Ld %r31, [%esp]

Add.c %ebx, %ebx, %r31

Ld %esi, [%ebp]

Sub.c %ecx, %ecx, 5

Translation Step 3

1. Ld %r30, [%esp]; Sub.c %ecx, %ecx, 5

2. Ld %esi, [%ebp]; Add %eax, %eax, %r30; Add %ebx, %ebx, %r30

Scheduling -remaining atoms into molecules using a large window.

Scheduled Native VLIW code

Optimized Native VLIW code

Ld %r30, [%esp]

Add %eax, %eax, %r30

Add %ebx, %ebx, %r30

Ld %esi, [%ebp]

Sub.c %ecx, %ecx, 5

Software’s Edge

� Molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.�The hardware doesn’t need to perform complex instruction reordering.�Simplicity means fast and low-power design.

� Processor upgrades are simplified.�Software layer means that software developers don’t have to recompile

programs.�New hardware architecture only needs a new code morphing software

from Transmeta.

Page 8: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

8

Software’s Edge

� Code morphing software can be upgraded independently into flash ROM.

� Software layer helps debugging process.�There are different ways to perform the same function so

software can be changed in debug process.

� Software layer increases performance.�Timing of critical paths are improved.�Optimization is applied to remove unnecessary instructions.�Software reordering can be done much better than hardware by

looking at a bigger window of instructions and applying more complicated algorithms.

Several ISA

� Allows you to mix instruction sets with ease because they are all emulated by the software.

Hardware

Page 9: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

9

Chip Simplifications

�No Superscalar decode, grouping or issue logic.�No register renaming or segmentation

hardware.�No floating point stack hardware.�No front end memory management.�Less interlock and bypassing logic.

Page 10: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

10

Hardware Specifications

�128 bit High performance VLIW engine�2 Integer units (ALU’s)�Floating point unit�Memory unit�Branch unit

Code Morphing Hardware Support

�Handling exceptions by shadowing.�Commit and rollback.�Gated Store Buffer.�Aliasing Hardware.�Protection for self modifying code.�LongRun Technology.

Page 11: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

11

TM5400 & TM3120

TM3120

TM5400

Page 12: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

12

Performance

� Originally designed for 32 bit conversions i.e. Unix (TM3120).

� However 16 bit windows instructions translated poorly so (TM5400)�The chip was redesigned to give better support to Windows 95

16-bit applications. �Larger caches were also included for improved Windows

performance.

� Two chips - Two different applications.

Low Power Consumption

�Fewer Transistors (simpler hardware)�Virtual devices

�TM3120 Chip Voltage of 1.5V�LongRun Power Management (TM5400

Dynamically adjustable frequency & voltage )

Implications

�Low Power consumption means low heat�Low heat means no need for noisy power

hungry fans or heat sinks�Smaller lightweight computers possible�Economy and extended battery life for mobile

computers

Page 13: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

13

Processor Thermal Comparison

Intel Pentium III Crusoe TM5400 Processor

Comparison of Watts per Hour

0

0.2

0.4

0.6

0.8

1

1.2

Office 2000 Web Browser Mp3 DVD

Applictions

Wat

ts p

er H

our

Mobile Pentum III 500MhzTM5400TM3120

Applications:Application of the 3120

�Suitable for portable and embedded systems.�Runs a mobile Linux kernel.�Capable of running Internet applications:

⌧Web browsers.⌧e-mail applications.⌧Streaming video.

Page 14: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

14

Application of the 5400

�Ultralite Laptops.�Microsoft Windows compatible.�Computer makers backing Transmeta include:

� IBM, Fujitsu, FIC, NEC, and Hitachi.

Mobile Internet Appliance

�Web pads, Notebooks, Smart Phones,PDAs.

Super Parallel Computing

�Code-morphing allows big jobs to be processed with a mixture of CPUs easily.

�Crusoe CPU’s can receive a block of code and dynamically re-compile it into their own ISA.

Page 15: Lecture 12 - College of Engineering and Computer Sciencecputnam/Comp546/Adv Computer Systems - Sweden...1 Lecture 12 Architectures for Low Power: Transmeta’s Crusoe Processor Motivation

15

Future Plans

�Targeting the Desktop market.

�Faster Crusoe chips�speeds in excess of 1.4Ghz�cache sizes as high as 2MB