47
Codesigned Virtual Mac hines Part <II> 2006. 10. 18 Yu, Young Jin DCSLAB

Codesigned Virtual Machines Part

  • Upload
    annick

  • View
    39

  • Download
    1

Embed Size (px)

DESCRIPTION

Codesigned Virtual Machines Part . 2006. 10. 18 Yu, Young Jin DCSLAB. Contents. Introduction Case Study (1) Transmeta Crusoe Case Study (2) IBM AS/400. Applying Codesigned VMs. Advantages(performance, power efficiency, flexibility) can be achieved, - PowerPoint PPT Presentation

Citation preview

Page 1: Codesigned Virtual Machines Part

Codesigned Virtual MachinesPart <II>

2006. 10. 18Yu, Young Jin

DCSLAB

Page 2: Codesigned Virtual Machines Part

Contents• Introduction• Case Study (1)

– Transmeta Crusoe• Case Study (2)

– IBM AS/400

Page 3: Codesigned Virtual Machines Part

Applying Codesigned VMs

• Advantages(performance, power efficiency, flexibility) can be achieved,– At the macro level: entirely new ISAs

• VLIW: Transmeta Crusoe, IBM Daisy/BOA• OO source ISA: IBM AS/400

– At the micro level• The implementation of specific performance enhan

cement• Instructions reordering, …

Page 4: Codesigned Virtual Machines Part

Case Study (1):

Transmeta Crusoe

Page 5: Codesigned Virtual Machines Part

Introduction• In Jan. of 2000, Transmeta Corp. introduce

d the Crusoe processors.– Remarkably low power consumption

• As might not be expected, The new technology is fundamentally software-based.– The power savings come from replacing large n

umbers of transistors with software.

Page 6: Codesigned Virtual Machines Part

The Crusoe Processor• Consists of a hardware engine logically sur

rounded by a software layer.– H/W: The engine

• is a VLIW CPU capable of executing up to four operations in each clock cycle.

• No resemblance to the x86 instruction set.

– S/W: Code Morphing Software(CMS)• Dynamically “morphs” x86 instructions into VLIW in

structions

Page 7: Codesigned Virtual Machines Part

The Crusoe Processor

Page 8: Codesigned Virtual Machines Part

• CMS technology changes the entire approach to designing microprocessors.– Demonstrate practical microprocessors can

be implemented as HW-SW hybrids.– Expanded the design space– Development teams may enlist software

experts, working in parallel with hardware engineers to bring products to market faster.

The Crusoe Processor

Page 9: Codesigned Virtual Machines Part

Technology Perspective• Decoupled the x86 ISA from the underlying

processor hardware.– Each new CPU design only requires a new version

of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set.

• Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.

Page 10: Codesigned Virtual Machines Part

x86 vs. Crusoe

Page 11: Codesigned Virtual Machines Part

Crusoe Processor Fundamentals

• VLIW engine– Two integer units, a floating point unit, a memory(stor

e/load) unit, a branch unit– Molecule: a long(64 or 128bits) instruction word conta

in up to four RISC-like instructions, called atom.– All atoms within a molecule are executed in parallel, a

nd the molecule format directly determines how atoms get routed to functional units.

• This greatly simplifies the decode and dispatch hardware.

Page 12: Codesigned Virtual Machines Part

Crusoe Processor Fundamentals

• The integer register file– Has 64 registers, %r0 through %r63– CMS allocates some registers to hold

x86 state while others contain state internal to the system, or can be used as temporary registers.

Page 13: Codesigned Virtual Machines Part

Crusoe Processor Fundamentals

• To keep the processor running at full speed, molecules are packed as fully as possible with atoms.

Page 14: Codesigned Virtual Machines Part

Conventional superscalar…

• This type of processor hardware is much more complex than the Crusoe processor’s simple VLIW engine.

Page 15: Codesigned Virtual Machines Part

Code Morphing Software• CMS

– Is fundamentally a dynamic translation system

– In this case, x86 ISA -> VLIW ISA– “x86 ISA” is the only thing x86 code

sees. • The only program written directly for the

VLIW engine is the Code Morphing Software itself.

Page 16: Codesigned Virtual Machines Part

Hierarchy

Page 17: Codesigned Virtual Machines Part

Hierarchy

Page 18: Codesigned Virtual Machines Part

Crusoe’s VLIW instr. Scheduling

Page 19: Codesigned Virtual Machines Part

Code Morphing Software

Page 20: Codesigned Virtual Machines Part

CMS Memory Layout

Page 21: Codesigned Virtual Machines Part

CMS: Drawing the HW-SW line• Choosing which functions to

implement in HW and which in SW is a major engineering challenge– Involving issues such as cost and

complexity, overall performance and power consumption

– For example, The HW-SW line might be drawn differently for a high-end server processor.

Page 22: Codesigned Virtual Machines Part

CMS: Decoding and Scheduling

• Code Morphing can translate an entire group of x86 instructions at once, – Whereas a superscalar x86 translates single

instructions in isolation.

• The Code Morphing approach can amortize the cost of translation over many executions.– Allowing it to use much more sophisticated

translation and scheduling algorithm.

Page 23: Codesigned Virtual Machines Part

CMS: Caching• The translation cache resides in a separate

memory space that is inaccessible to x86 code.

• As an application executes,– Code Morphing “learns” more about the program

and improves it so will execute faster and faster.

• Some benchmarks do not accurately predict the performance of Crusoe processor!!

Page 24: Codesigned Virtual Machines Part

CMS: Filtering• The translation system needs to

– Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code.

• A wide choice of execution modes– Interpretation only(no translation)– Simple-mined code generation– Highly-optimized code generation

Page 25: Codesigned Virtual Machines Part

CMS: Prediction and Path Selection

• CMS can gather feedback

– Instrumentation profiling• The translator adds code to collect info.

– This data can be used later to decide when and what to optimize and translate.• For example, if a given branch is highly

biased,…

Page 26: Codesigned Virtual Machines Part

CMS: Making a Translation

Front end

Well-knownoptimizations

Scheduling

The molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.

Page 27: Codesigned Virtual Machines Part

HW Support for Code Morphing• Exceptions • “precise exception” problemtrap

“too soon”

* Solution: Use Shadow Register !

Page 28: Codesigned Virtual Machines Part

HW Support for Code Morphing• All registers holding x86 state are shadowe

d. (working/shadow copy)– Normal atoms only update the working copy of t

he register.– “commit” operation: working -> shadow regs.– “rollback” operation: shadow -> working regs.

• Undoing changes to memory– Holding store data in a “gated store buffer”– Commit / rollback

Page 29: Codesigned Virtual Machines Part

HW Support for Code Morphing• Alias Hardware

– When the translator moves a load operation ahead of a store operation,

– it converts the load into a load-and-protect and the store into a store-under-alias-mask.

– Always safe to reorder memory ld/stores.

Page 30: Codesigned Virtual Machines Part

HW Support for Code Morphing• Alias Hardware

<Original Code>

St 0(r1), r2…Ld r3, 0(r4)…St 0(r5), r6…Ld r7, 0(r8)Add r9, r3, r7

<Rescheduled Code> - UnsafeLd r3, 0(r4)Ld r7, 0(r8)St 0(r1), r2……St 0(r5), r6…Add r9, r3, r7

<Rescheduled Code> - ProtectedLdp r3, 0(r4) xLdp r7, 0(r8) x xStam 0(r1), r2……Stam 0(r5), r6…Add r9, r3, r7

* The ldp/stam pair is an excellent example that illustrates the interplay between the codesigned hardware and software in a codesigned VM.

Page 31: Codesigned Virtual Machines Part

HW Support for Code Morphing• Coping with Self-Modifying Code

– X86 inst. in memory get overwritten, either• Because OS is loading a new program, or• Because an application is using self-modifying

code.– When this happens to code that has

already been translated,• The CMS needs to be notified to keep it from

erroneously executing a translation for the old code.

Page 32: Codesigned Virtual Machines Part

HW Support for Code Morphing• Coping with Self-Modifying Code

– Whenever the system translates a block of x86 code, it write-protects the page.• It does so by setting a dedicated

“translated” bit in that page’s entry in the processor’s memory management unit.

• That bit is invisible to x86 software.– When a protected page is written to, the

simplest remedy is to invalidate the affected translations.

Page 33: Codesigned Virtual Machines Part

Example: A complex translation

Page 34: Codesigned Virtual Machines Part

Case Study (2):

IBM AS/400

Page 35: Codesigned Virtual Machines Part

From IBM’s homepage…• The accelerating rate of change of

both hardware and software technologies necessitates that the system you select has been designed with the future in mind.– “We believe that the IBM AS/400 will be

the number one choice !”

Page 36: Codesigned Virtual Machines Part

Introduction• The design of AS/400 insulates app

programs from changing hw characteristics through the layer of microcode.– The interface: TIMI– The microcode layer: LIC

• In 1995, AS/400 changed its processor technology ( CISC -> 64bit RISC )– No recompiling/rewriting– Not only did they run, but they were fully 64-bit

programs.

Page 37: Codesigned Virtual Machines Part

AS/400 architecture

TIMI layer separates the hw and LIC from OS

Instructions are translated to a specific hw instruction set as part of the backend of the compilation process.

Page 38: Codesigned Virtual Machines Part

AS/400 architecture• TIMI is a virtual instruction set.

– All user-mode programs are stored as TIMI instructions.

– Conceptually somewhat similar to the VM architecture of programming env such as Smalltalk, Java and .NET

– Stored within the final program object– Object-based ISA

Page 39: Codesigned Virtual Machines Part

Memory Architecture• The TIMI has a memory architecture

composed of objects.– The objects are completely isolated from

one another and can only be accessed via pointers.

– Actual address values contained in pointers are not made visible to SW above TIMI.

– The implementation of the object-based memory is done entirely below the TIMI.

Page 40: Codesigned Virtual Machines Part

Memory Architecture• Protecting the integrity of pointers is an es

sential part of any Object-Based system.– The object pointers are encoded in 128bits.

• Upper 64 bits: type info, authorization, …• Lower 64 bits: 64-bit PowerPC virtual addr.

– Significant extension to PowerPC mem.arch.• Adding of protection for object pointers

– Load/Store-pointer instruction.– 65th bit for indicating whether the location contains a poin

ter

Page 41: Codesigned Virtual Machines Part

Instruction Set• TIMI instruction format

• Multiway conditional branch– This is the “architected representation”– It is translated to an impl-dependent form, and it doe

s the work of multiple RISC instructions.

opcodeopcodeextend

operand1 … operandN dest1 … dest4

2 bytes 2 bytes 3 bytes 3 bytes 3 bytes 3 bytes

(optional) (optional) (optional) (optional) (optional)

Addn & branch Eq 0 Gt 0 0 0 sum addend

1addend

2 dest1 dest2

Page 42: Codesigned Virtual Machines Part

Instruction SetInstr. addn 34 32 31 muln 36 34 37 Instr.

… const Binary(2) Binary(2) Binary(4) const …

1 31 32 33 34 35 36 37

ODT DirectionVector

4 A 2 3 … 1 3 D F …ODT EntryString

• Add numeric and multiply numeric, are generic• Entries in the ODT indicate the types of operands and the data flow.• The actual storage locations: after the TIMI is translated

Page 43: Codesigned Virtual Machines Part

Input/Output• The presence of IOPs simplifies the task of

pushing the device-dependent aspects out of the central processor.

Page 44: Codesigned Virtual Machines Part

Input/Output• At the level of TIMI,

– There is no secondary(disk) storage; rather it is part of the unified mem architecture.• All disk management SW, drivers, etc. exist in the i

mpl-dependent part of the system.

• The OS interacts with SW below the TIMI level(and with I/O devices)– through instructions that operate on the TIMI-le

vel objects.

Page 45: Codesigned Virtual Machines Part

Input/Output• TIMI-Supported Objects

– Access group, Context, …– Authorization List, User Profile, …– Dictionary, Index, …– Queue, Mode descriptor, …– Logical unit descriptor, …– Module, Program, …

Page 46: Codesigned Virtual Machines Part

Code Translation & Concealment

• HLL -> Template(TIMI + ODT) -> Program Object• The contents of the program object cannot be dir

ectly observed above the TIMI level.• Materialization

– Giving back to the user in the original, machine-independent form

– The platform switch is transparent to the user.

Page 47: Codesigned Virtual Machines Part

Code Translation & Concealment

Space objectHLL

Program

Progm. object

Compiler

Space object

<template>TIMI,ODT

Program Object

<template>TIMI,ODT

Impl-dependentExecutable

code

Create program source result

TIMI Level

Translator