1 Kheiron: Runtime Adaptation of Native-C and Bytecode Applications Rean Griffith, Gail Kaiser Programming Systems Lab (PSL) Columbia University June 14

1

Kheiron: Runtime Adaptation of Kheiron: Runtime Adaptation of Native-C and Bytecode ApplicationsNative-C and Bytecode Applications

Rean Griffith, Gail KaiserRean Griffith, Gail KaiserProgramming Systems Lab (PSL)Programming Systems Lab (PSL)

Columbia UniversityColumbia University

June 14 2006June 14 2006Presented by Rean GriffithPresented by Rean [email protected]@cs.columbia.edu

2

OverviewOverview

IntroductionIntroduction ProblemProblem SolutionSolution System OperationSystem Operation Feasibility ExperimentsFeasibility Experiments Supported AdaptationsSupported Adaptations Conclusions & Future WorkConclusions & Future Work

3

IntroductionIntroduction

Self-healing systems are supposed to Self-healing systems are supposed to reduce the cost and complexity of reduce the cost and complexity of system management.system management.

Extra facilities for problem detection, Extra facilities for problem detection, diagnosis and remediation help end-diagnosis and remediation help end-users and administrators.users and administrators.

Sounds great, where do I get one?Sounds great, where do I get one?

4

ProblemProblem

Existing/legacy systems don’t have all Existing/legacy systems don’t have all the self-healing mechanisms they’ll the self-healing mechanisms they’ll ever need.ever need.

Tomorrow’s systems won’t have all of Tomorrow’s systems won’t have all of them either.them either.

It’s impractical, costly and time-It’s impractical, costly and time-consuming to re-design, re-build and consuming to re-design, re-build and re-deploy new self-healing versions.re-deploy new self-healing versions.

What happens when we need a new What happens when we need a new self-healing facility?self-healing facility?

5

6 Questions6 Questions

Can we retro-fit self-healing mechanisms Can we retro-fit self-healing mechanisms onto existing systems as a form of system onto existing systems as a form of system adaptation?adaptation?

How could we do it?How could we do it? Can we do it on-the-fly?Can we do it on-the-fly? Can we do things in a general way rather Can we do things in a general way rather

than ad-hoc one-time fixes?than ad-hoc one-time fixes? Sounds risky, if we can do it, can we give Sounds risky, if we can do it, can we give

any guarantees?any guarantees? What kinds of self-healing mechanisms What kinds of self-healing mechanisms

can we add?can we add?

6

3.5 Quick Answers3.5 Quick Answers

Can we retro-fit self-healing Can we retro-fit self-healing mechanisms onto existing mechanisms onto existing systems?systems?

YesYes

How could we do it?How could we do it? ……

Can we do it on the fly?Can we do it on the fly? YesYes

Can we do it in a general way, Can we do it in a general way, rather than ad-hoc one-time fixes?rather than ad-hoc one-time fixes?

YesYes

If we can do it, can we give If we can do it, can we give guarantees?guarantees?

SomeSome

What kinds of self-healing What kinds of self-healing mechanisms can we add?mechanisms can we add?

……

7

How can we do it?How can we do it?

Observation: All software systems run in a Observation: All software systems run in a software execution environment (EE). Use software execution environment (EE). Use it as the lowest common denominator for it as the lowest common denominator for adapting live systems.adapting live systems.

Hypotheses:Hypotheses: The execution environment is a feasible target The execution environment is a feasible target

for efficiently and transparently effecting for efficiently and transparently effecting adaptations in the applications they host.adaptations in the applications they host.

Existing facilities in unmodified execution Existing facilities in unmodified execution environments can be used to effect runtime environments can be used to effect runtime adaptations.adaptations.

Any guarantees we give are a function of the Any guarantees we give are a function of the execution environment and its operation.execution environment and its operation.

8

Solution ConsiderationsSolution Considerations

Two kinds of execution environments:Two kinds of execution environments: Un-managed/native [Processor + OS e.g. Un-managed/native [Processor + OS e.g.

x86 + Linux]x86 + Linux] Managed [JVM/CLR]Managed [JVM/CLR]

What do we need from the EE?What do we need from the EE? Facility for tracing program execution.Facility for tracing program execution. Facility for controlling program Facility for controlling program

execution.execution. Access to metadata about the units of Access to metadata about the units of

execution.execution. Facility for adding/editing metadata.Facility for adding/editing metadata.

9

Comparing Execution EnvironmentsComparing Execution Environments

Unmanaged Unmanaged Execution Execution EnvironmentEnvironment

Managed Execution EnvironmentManaged Execution Environment

ELF BinariesELF Binaries JVM 5.xJVM 5.x CLR 1.1CLR 1.1

Program Program tracingtracing

ptrace, /procptrace, /proc JVMTI callbacks + JVMTI callbacks + APIAPI

ICorProfilerInfoICorProfilerInfo

ICorProfilerCallbaICorProfilerCallbackck

Program Program controlcontrol

Trampolines Trampolines + Dyninst+ Dyninst

Bytecode Bytecode rewritingrewriting

MSIL rewritingMSIL rewriting

Execution unit Execution unit metadatametadata

.symtab, .deb.symtab, .debug sectionsug sections

Classfile constant Classfile constant pool + bytecodepool + bytecode

Assembly, type & Assembly, type & method metadata method metadata + MSIL+ MSIL

Metadata Metadata augmentationaugmentation

N/A for N/A for compiled compiled

C-programsC-programs

Custom classfile Custom classfile parsing & editing parsing & editing APIs + JVMTIAPIs + JVMTI

RedefineClassesRedefineClasses

IMetaDataImport, IMetaDataImport, IMetaDataEmit IMetaDataEmit APIsAPIs

10

System Architecture from 10,000ftSystem Architecture from 10,000ft

11

How Kheiron WorksHow Kheiron Works Attaches to programs while they run or Attaches to programs while they run or

when they load.when they load. Interacts with programs while they run at Interacts with programs while they run at

various points of their execution.various points of their execution. Augments type definitions and/or executable Augments type definitions and/or executable

codecode Needs metadata – rich metadata is betterNeeds metadata – rich metadata is better

Interposes at method granularity, inserting Interposes at method granularity, inserting new functionality via method prologues new functionality via method prologues and epilogues.and epilogues.

Control can be transferred into/out of Control can be transferred into/out of adaptation library logicadaptation library logic

Control-flow changes can be done/un-done Control-flow changes can be done/un-done dynamicallydynamically

12

System OperationSystem Operation

Time period/Time period/

execution execution eventevent

Unmanaged/Native Unmanaged/Native Applications Applications

(C-Programs)(C-Programs)

Managed ApplicationsManaged Applications

JVM 5.xJVM 5.x CLR 1.1CLR 1.1

Application Application startstart

Attach Kheiron, Attach Kheiron, augment methodsaugment methods

Load Kheiron/JVMLoad Kheiron/JVM Load Load Kheiron/CLRKheiron/CLR

Module loadModule load No real metadata to No real metadata to manipulatemanipulate

Augment type Augment type definition, definition, augment module augment module metadata, metadata, bytecode rewritebytecode rewrite

Augment type Augment type definition, definition, augment module augment module metadatametadata

Method Method invoke/entryinvoke/entry

Transfer control to Transfer control to adaptation logicadaptation logic

Transfer control Transfer control to adaptation to adaptation logiclogic


Method JITMethod JIT n/an/a No explicit No explicit notificationsnotifications

Augment module Augment module metadata, MSIL metadata, MSIL rewrite, force re-rewrite, force re-jitjit

Method exitMethod exit Transfer control to Transfer control to adaptation logicadaptation logic



13

Kheiron/C OperationKheiron/C Operation

Kheiron/C

Dyninst API

Dyninst Code

ptrace/procfs

void foo( int x, int y){ int z = 0;}

Snippets

C/C++ Runtime Library

Points

ApplicationMutator

14

Kheiron/JVM OperationKheiron/JVM Operation

SampleMethod

BytecodeMethod

body

SampleMethod

BytecodeMethod

body

_SampleMethod SampleMethod

NewBytecodeMethodBody

Call_SampleMethod

_SampleMethod

BytecodeMethod

body

A B C

PrepareShadow

CreateShadow

SampleMethod( args ) [throws NullPointerException] <room for prolog> push args call _SampleMethod( args ) [throws NullPointerException] { try{…} catch (IOException ioe){…} } // Source view of _SampleMethod’s body <room for epilog> return value/void

15

ExperimentsExperiments Goal: Measure the feasibility of our approach.Goal: Measure the feasibility of our approach. Look at the impact on execution when no Look at the impact on execution when no

repairs/adaptations are active.repairs/adaptations are active. Selected compute-intensive applications as Selected compute-intensive applications as

test subjects (SciMark and Linpack).test subjects (SciMark and Linpack). Unmanaged experimentsUnmanaged experiments

P4 2.4 GHz processor, 1GB RAM, SUSE 9.2, 2.6.8x P4 2.4 GHz processor, 1GB RAM, SUSE 9.2, 2.6.8x kernel, Dyninst 4.2.1.kernel, Dyninst 4.2.1.

Managed experimentsManaged experiments P3 Mobile 1.2 GHz processor, 1GB RAM, Windows P3 Mobile 1.2 GHz processor, 1GB RAM, Windows

XP SP2, Java HotspotVM v1.5 update 04.XP SP2, Java HotspotVM v1.5 update 04.

16

Kheiron/C – ResultsKheiron/C – Results

Performance comparison SciMark - normalized to w/o Dyninst - simple jump into adaptation library

0.8

0.85

0.9

0.95

1

1.05

1.1

1 2 3 4 5

Run#

Perf

orm

an

ce n

orm

ali

zed

to

w/o

Dyn

inst

Normalized w/o Dyninst

Normalized w/Dyninst

Run 1Run 1 Run 2Run 2 Run 3Run 3 Run 4Run 4 Run 5Run 5 AvgAvg stdstd

InstrumentatiInstrumentation time (ms)on time (ms)

689.33689.33 691.01691.01 675.87675.87 678.78678.78 689.79689.79 684.9684.966

7.0687.06866

17

Kheiron/JVM – ResultsKheiron/JVM – ResultsPerformance comparison - normalized to w/o profiler - no repair

active

98.60% 98.63%

00.05

0.10.15

0.20.25

0.30.35

0.40.45

0.50.55

0.60.65

0.70.75

0.80.85

0.90.95

11.05

1.1

SciMark Linpack

Benchmarks

Pe

rfo

rma

nc

e n

orm

aliz

ed

to

w/o

pro

file

r

without profiler

with profiler

InstrumentatiInstrumentation timeon time

Sub-millisecond since all instrumentation done at load-Sub-millisecond since all instrumentation done at load-time as in-memory operations on the classfile byte time as in-memory operations on the classfile byte array.array.

18

What did we learn from our What did we learn from our experiments?experiments?

Our approach is feasible with between ~1% - Our approach is feasible with between ~1% - 5% runtime overhead when no repairs active.5% runtime overhead when no repairs active.

Kheiron is transparent to both the application Kheiron is transparent to both the application and the unmodified execution environment.and the unmodified execution environment.

More/rich metadata makes things “easier”More/rich metadata makes things “easier” Easier to navigate and make changes in managed Easier to navigate and make changes in managed

execution environments then their un-managed execution environments then their un-managed counterparts.counterparts.

We can perform and undo our changes on-We can perform and undo our changes on-the-fly. Allowing us to manage the the-fly. Allowing us to manage the performance impact.performance impact.

We use a general approach where we can We use a general approach where we can hook/interpose at method-granularity in a hook/interpose at method-granularity in a variety of execution environments.variety of execution environments.

19

Unmanaged Execution Environment Unmanaged Execution Environment MetadataMetadata

Not enough information to support type Not enough information to support type discovery and/or type relationships.discovery and/or type relationships.

No APIs for metadata manipulation.No APIs for metadata manipulation. In the managed world, units of execution are In the managed world, units of execution are

self-describing.self-describing.

20

Adaptation GuaranteesAdaptation Guarantees

Managed execution environments give Managed execution environments give guarantees about:guarantees about: Valid executables – bytecode verificationValid executables – bytecode verification Security attributes – security sandboxes and Security attributes – security sandboxes and

permissions/policies.permissions/policies. These guarantees encoded in metadata in These guarantees encoded in metadata in

the units of execution.the units of execution. Any inserted adaptations are bound by the Any inserted adaptations are bound by the

same rules as the original application.same rules as the original application. Un-managed execution environments Un-managed execution environments

don’t give the same guarantees.don’t give the same guarantees.

21

Supported AdaptationsSupported Adaptations

Instrumentation insertion/removal.Instrumentation insertion/removal. Component/structure instance-caching.Component/structure instance-caching. Periodic/on-demand consistency checks Periodic/on-demand consistency checks

on cached components or sub-system on cached components or sub-system interfaces.interfaces.

Hot component swaps.Hot component swaps. Function-input filters.Function-input filters. Residual testing.Residual testing. Ghost Transactions – (POST for software).Ghost Transactions – (POST for software). Selective Emulation (compiled C-Selective Emulation (compiled C-

binaries).binaries).

22

Selective Emulation Using STEM + Selective Emulation Using STEM + DyninstDyninst

STEM – an instruction level x86 STEM – an instruction level x86 emulator developed by another emulator developed by another group at Columbia (Locasto et. al.).group at Columbia (Locasto et. al.).

Dyninst – a toolkit for instrumenting Dyninst – a toolkit for instrumenting running C-applications.running C-applications.

23

How it worksHow it works

Running an application in an Running an application in an emulator/sandbox isn’t a new ideaemulator/sandbox isn’t a new idea Security benefitsSecurity benefits Isolation benefitsIsolation benefits

High overheads associated with High overheads associated with whole-program execution – Valgrind, whole-program execution – Valgrind, Bochs, original STEM.Bochs, original STEM.

Idea: Vary, at runtime, the portions Idea: Vary, at runtime, the portions of the application which run inside of the application which run inside the STEM emulator to manage the the STEM emulator to manage the performance impact.performance impact.

24

Background on STEMBackground on STEM

Original STEM works at the source level:Original STEM works at the source level:

void foo()void foo(){{ int i = 0;int i = 0; // save cpu registers macro// save cpu registers macro emulate_init();emulate_init(); // begin emulation function call// begin emulation function call emulate_begin();emulate_begin(); i = i + 10;i = i + 10; // end emulation function call// end emulation function call emulate_end();emulate_end(); // commit/restore cpu registers macro// commit/restore cpu registers macro emulate_term();emulate_term();}}

25

Using un-modified Dyninst 4.2.1Using un-modified Dyninst 4.2.1

void foo()void foo(){{ int i = 0;int i = 0; // save cpu registers macro// save cpu registers macro emulate_init(); // Oops…can’t inject macros with Dyninstemulate_init(); // Oops…can’t inject macros with Dyninst // begin emulation function call// begin emulation function call emulate_begin(); // OK to inject function calls with emulate_begin(); // OK to inject function calls with

DyninstDyninst i = i + 10;i = i + 10; // end emulation function call// end emulation function call emulate_end(); // OK to inject function calls with Dyninstemulate_end(); // OK to inject function calls with Dyninst // commit/restore cpu registers macro// commit/restore cpu registers macro emulate_term(); // Oops…can’t inject macros with emulate_term(); // Oops…can’t inject macros with

DyninstDyninst}}

26

Modified STEM + DyninstModified STEM + Dyninst

Modify Dyninst trampoline to save CPU state to a memory Modify Dyninst trampoline to save CPU state to a memory address (rather than the stack) before method call.address (rather than the stack) before method call.

Use Dyninst API to allocate memory areas in target process Use Dyninst API to allocate memory areas in target process address space for address space for register storage arearegister storage area and and code storage areacode storage area..

Save instructions relocated by trampoline to prime STEM’s Save instructions relocated by trampoline to prime STEM’s instruction pipeline in the code storage area.instruction pipeline in the code storage area.

Use Dyninst API to insert calls to our RegisterSave and Use Dyninst API to insert calls to our RegisterSave and EmulatorPrime functions which configure STEM.EmulatorPrime functions which configure STEM.

Use Dyninst API to insert calls to STEM’s emulate_begin().Use Dyninst API to insert calls to STEM’s emulate_begin(). Modify STEM to keep track of its stack depth (initially set to 0), Modify STEM to keep track of its stack depth (initially set to 0),

emulation ends when a ret/leave instruction is encountered at emulation ends when a ret/leave instruction is encountered at stack depth 0. The search for emulate_end goes away.stack depth 0. The search for emulate_end goes away.

27

Conclusions – 6 AnswersConclusions – 6 Answers

Kheiron can be used to efficiently and Kheiron can be used to efficiently and transparently retro-fit self-healing mechanisms transparently retro-fit self-healing mechanisms onto existing systems as a form of adaptation. onto existing systems as a form of adaptation.

Kheiron uses facilities and characteristics of the Kheiron uses facilities and characteristics of the unmodified execution environment to adapt unmodified execution environment to adapt running programs.running programs.

Changes can be done/un-done at runtime to Changes can be done/un-done at runtime to manage the performance impact as well as give manage the performance impact as well as give flexibility in evolving the system.flexibility in evolving the system.

Based on metadata, and its verification/validation Based on metadata, and its verification/validation rules, we can extend existing systems in a rules, we can extend existing systems in a general way.general way.

Guarantees on application properties are a Guarantees on application properties are a function of the execution environment.function of the execution environment.

Kheiron supports a wide range of adaptations.Kheiron supports a wide range of adaptations.

28

Future WorkFuture Work Kheiron can be used for disturbance/fault injection.Kheiron can be used for disturbance/fault injection. Working on a methodology for benchmarking self-healing

systems with respect to the efficacy of their self-healing mechanisms (extensions to work done by Aaron Brown et. al.).

Actively looking for systems to field-test/refine/reject ideas about our proposed benchmarking methodology for my thesis.

29

Questions, Comments, Queries?Questions, Comments, Queries?

Thank you for your time and attention.Thank you for your time and attention.

Contact:Contact:

Rean GriffithRean Griffith

[email protected]@cs.columbia.edu

[[email protected]][[email protected]]

Documents

1 Kheiron: Runtime Adaptation of Native-C and Bytecode Applications Rean Griffith, Gail Kaiser Programming Systems Lab (PSL) Columbia University June 14