Upload
lewis-johnston
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
1
Kheiron: Runtime Adaptation of Kheiron: Runtime Adaptation of Native-C and Bytecode ApplicationsNative-C and Bytecode Applications
Rean Griffith, Gail KaiserRean Griffith, Gail KaiserProgramming Systems Lab (PSL)Programming Systems Lab (PSL)
Columbia UniversityColumbia University
June 14 2006June 14 2006Presented by Rean GriffithPresented by Rean [email protected]@cs.columbia.edu
2
OverviewOverview
IntroductionIntroduction ProblemProblem SolutionSolution System OperationSystem Operation Feasibility ExperimentsFeasibility Experiments Supported AdaptationsSupported Adaptations Conclusions & Future WorkConclusions & Future Work
3
IntroductionIntroduction
Self-healing systems are supposed to Self-healing systems are supposed to reduce the cost and complexity of reduce the cost and complexity of system management.system management.
Extra facilities for problem detection, Extra facilities for problem detection, diagnosis and remediation help end-diagnosis and remediation help end-users and administrators.users and administrators.
Sounds great, where do I get one?Sounds great, where do I get one?
4
ProblemProblem
Existing/legacy systems don’t have all Existing/legacy systems don’t have all the self-healing mechanisms they’ll the self-healing mechanisms they’ll ever need.ever need.
Tomorrow’s systems won’t have all of Tomorrow’s systems won’t have all of them either.them either.
It’s impractical, costly and time-It’s impractical, costly and time-consuming to re-design, re-build and consuming to re-design, re-build and re-deploy new self-healing versions.re-deploy new self-healing versions.
What happens when we need a new What happens when we need a new self-healing facility?self-healing facility?
5
6 Questions6 Questions
Can we retro-fit self-healing mechanisms Can we retro-fit self-healing mechanisms onto existing systems as a form of system onto existing systems as a form of system adaptation?adaptation?
How could we do it?How could we do it? Can we do it on-the-fly?Can we do it on-the-fly? Can we do things in a general way rather Can we do things in a general way rather
than ad-hoc one-time fixes?than ad-hoc one-time fixes? Sounds risky, if we can do it, can we give Sounds risky, if we can do it, can we give
any guarantees?any guarantees? What kinds of self-healing mechanisms What kinds of self-healing mechanisms
can we add?can we add?
6
3.5 Quick Answers3.5 Quick Answers
Can we retro-fit self-healing Can we retro-fit self-healing mechanisms onto existing mechanisms onto existing systems?systems?
YesYes
How could we do it?How could we do it? ……
Can we do it on the fly?Can we do it on the fly? YesYes
Can we do it in a general way, Can we do it in a general way, rather than ad-hoc one-time fixes?rather than ad-hoc one-time fixes?
YesYes
If we can do it, can we give If we can do it, can we give guarantees?guarantees?
SomeSome
What kinds of self-healing What kinds of self-healing mechanisms can we add?mechanisms can we add?
……
7
How can we do it?How can we do it?
Observation: All software systems run in a Observation: All software systems run in a software execution environment (EE). Use software execution environment (EE). Use it as the lowest common denominator for it as the lowest common denominator for adapting live systems.adapting live systems.
Hypotheses:Hypotheses: The execution environment is a feasible target The execution environment is a feasible target
for efficiently and transparently effecting for efficiently and transparently effecting adaptations in the applications they host.adaptations in the applications they host.
Existing facilities in unmodified execution Existing facilities in unmodified execution environments can be used to effect runtime environments can be used to effect runtime adaptations.adaptations.
Any guarantees we give are a function of the Any guarantees we give are a function of the execution environment and its operation.execution environment and its operation.
8
Solution ConsiderationsSolution Considerations
Two kinds of execution environments:Two kinds of execution environments: Un-managed/native [Processor + OS e.g. Un-managed/native [Processor + OS e.g.
x86 + Linux]x86 + Linux] Managed [JVM/CLR]Managed [JVM/CLR]
What do we need from the EE?What do we need from the EE? Facility for tracing program execution.Facility for tracing program execution. Facility for controlling program Facility for controlling program
execution.execution. Access to metadata about the units of Access to metadata about the units of
execution.execution. Facility for adding/editing metadata.Facility for adding/editing metadata.
9
Comparing Execution EnvironmentsComparing Execution Environments
Unmanaged Unmanaged Execution Execution EnvironmentEnvironment
Managed Execution EnvironmentManaged Execution Environment
ELF BinariesELF Binaries JVM 5.xJVM 5.x CLR 1.1CLR 1.1
Program Program tracingtracing
ptrace, /procptrace, /proc JVMTI callbacks + JVMTI callbacks + APIAPI
ICorProfilerInfoICorProfilerInfo
ICorProfilerCallbaICorProfilerCallbackck
Program Program controlcontrol
Trampolines Trampolines + Dyninst+ Dyninst
Bytecode Bytecode rewritingrewriting
MSIL rewritingMSIL rewriting
Execution unit Execution unit metadatametadata
.symtab, .deb.symtab, .debug sectionsug sections
Classfile constant Classfile constant pool + bytecodepool + bytecode
Assembly, type & Assembly, type & method metadata method metadata + MSIL+ MSIL
Metadata Metadata augmentationaugmentation
N/A for N/A for compiled compiled
C-programsC-programs
Custom classfile Custom classfile parsing & editing parsing & editing APIs + JVMTIAPIs + JVMTI
RedefineClassesRedefineClasses
IMetaDataImport, IMetaDataImport, IMetaDataEmit IMetaDataEmit APIsAPIs
10
System Architecture from 10,000ftSystem Architecture from 10,000ft
11
How Kheiron WorksHow Kheiron Works Attaches to programs while they run or Attaches to programs while they run or
when they load.when they load. Interacts with programs while they run at Interacts with programs while they run at
various points of their execution.various points of their execution. Augments type definitions and/or executable Augments type definitions and/or executable
codecode Needs metadata – rich metadata is betterNeeds metadata – rich metadata is better
Interposes at method granularity, inserting Interposes at method granularity, inserting new functionality via method prologues new functionality via method prologues and epilogues.and epilogues.
Control can be transferred into/out of Control can be transferred into/out of adaptation library logicadaptation library logic
Control-flow changes can be done/un-done Control-flow changes can be done/un-done dynamicallydynamically
12
System OperationSystem Operation
Time period/Time period/
execution execution eventevent
Unmanaged/Native Unmanaged/Native Applications Applications
(C-Programs)(C-Programs)
Managed ApplicationsManaged Applications
JVM 5.xJVM 5.x CLR 1.1CLR 1.1
Application Application startstart
Attach Kheiron, Attach Kheiron, augment methodsaugment methods
Load Kheiron/JVMLoad Kheiron/JVM Load Load Kheiron/CLRKheiron/CLR
Module loadModule load No real metadata to No real metadata to manipulatemanipulate
Augment type Augment type definition, definition, augment module augment module metadata, metadata, bytecode rewritebytecode rewrite
Augment type Augment type definition, definition, augment module augment module metadatametadata
Method Method invoke/entryinvoke/entry
Transfer control to Transfer control to adaptation logicadaptation logic
Transfer control Transfer control to adaptation to adaptation logiclogic
Transfer control Transfer control to adaptation to adaptation logiclogic
Method JITMethod JIT n/an/a No explicit No explicit notificationsnotifications
Augment module Augment module metadata, MSIL metadata, MSIL rewrite, force re-rewrite, force re-jitjit
Method exitMethod exit Transfer control to Transfer control to adaptation logicadaptation logic
Transfer control Transfer control to adaptation to adaptation logiclogic
Transfer control Transfer control to adaptation to adaptation logiclogic
13
Kheiron/C OperationKheiron/C Operation
Kheiron/C
Dyninst API
Dyninst Code
ptrace/procfs
void foo( int x, int y){ int z = 0;}
Snippets
C/C++ Runtime Library
Points
ApplicationMutator
14
Kheiron/JVM OperationKheiron/JVM Operation
SampleMethod
BytecodeMethod
body
SampleMethod
BytecodeMethod
body
_SampleMethod SampleMethod
NewBytecodeMethodBody
Call_SampleMethod
_SampleMethod
BytecodeMethod
body
A B C
PrepareShadow
CreateShadow
SampleMethod( args ) [throws NullPointerException] <room for prolog> push args call _SampleMethod( args ) [throws NullPointerException] { try{…} catch (IOException ioe){…} } // Source view of _SampleMethod’s body <room for epilog> return value/void
15
ExperimentsExperiments Goal: Measure the feasibility of our approach.Goal: Measure the feasibility of our approach. Look at the impact on execution when no Look at the impact on execution when no
repairs/adaptations are active.repairs/adaptations are active. Selected compute-intensive applications as Selected compute-intensive applications as
test subjects (SciMark and Linpack).test subjects (SciMark and Linpack). Unmanaged experimentsUnmanaged experiments
P4 2.4 GHz processor, 1GB RAM, SUSE 9.2, 2.6.8x P4 2.4 GHz processor, 1GB RAM, SUSE 9.2, 2.6.8x kernel, Dyninst 4.2.1.kernel, Dyninst 4.2.1.
Managed experimentsManaged experiments P3 Mobile 1.2 GHz processor, 1GB RAM, Windows P3 Mobile 1.2 GHz processor, 1GB RAM, Windows
XP SP2, Java HotspotVM v1.5 update 04.XP SP2, Java HotspotVM v1.5 update 04.
16
Kheiron/C – ResultsKheiron/C – Results
Performance comparison SciMark - normalized to w/o Dyninst - simple jump into adaptation library
0.8
0.85
0.9
0.95
1
1.05
1.1
1 2 3 4 5
Run#
Perf
orm
an
ce n
orm
ali
zed
to
w/o
Dyn
inst
Normalized w/o Dyninst
Normalized w/Dyninst
Run 1Run 1 Run 2Run 2 Run 3Run 3 Run 4Run 4 Run 5Run 5 AvgAvg stdstd
InstrumentatiInstrumentation time (ms)on time (ms)
689.33689.33 691.01691.01 675.87675.87 678.78678.78 689.79689.79 684.9684.966
7.0687.06866
17
Kheiron/JVM – ResultsKheiron/JVM – ResultsPerformance comparison - normalized to w/o profiler - no repair
active
98.60% 98.63%
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.50.55
0.60.65
0.70.75
0.80.85
0.90.95
11.05
1.1
SciMark Linpack
Benchmarks
Pe
rfo
rma
nc
e n
orm
aliz
ed
to
w/o
pro
file
r
without profiler
with profiler
InstrumentatiInstrumentation timeon time
Sub-millisecond since all instrumentation done at load-Sub-millisecond since all instrumentation done at load-time as in-memory operations on the classfile byte time as in-memory operations on the classfile byte array.array.
18
What did we learn from our What did we learn from our experiments?experiments?
Our approach is feasible with between ~1% - Our approach is feasible with between ~1% - 5% runtime overhead when no repairs active.5% runtime overhead when no repairs active.
Kheiron is transparent to both the application Kheiron is transparent to both the application and the unmodified execution environment.and the unmodified execution environment.
More/rich metadata makes things “easier”More/rich metadata makes things “easier” Easier to navigate and make changes in managed Easier to navigate and make changes in managed
execution environments then their un-managed execution environments then their un-managed counterparts.counterparts.
We can perform and undo our changes on-We can perform and undo our changes on-the-fly. Allowing us to manage the the-fly. Allowing us to manage the performance impact.performance impact.
We use a general approach where we can We use a general approach where we can hook/interpose at method-granularity in a hook/interpose at method-granularity in a variety of execution environments.variety of execution environments.
19
Unmanaged Execution Environment Unmanaged Execution Environment MetadataMetadata
Not enough information to support type Not enough information to support type discovery and/or type relationships.discovery and/or type relationships.
No APIs for metadata manipulation.No APIs for metadata manipulation. In the managed world, units of execution are In the managed world, units of execution are
self-describing.self-describing.
20
Adaptation GuaranteesAdaptation Guarantees
Managed execution environments give Managed execution environments give guarantees about:guarantees about: Valid executables – bytecode verificationValid executables – bytecode verification Security attributes – security sandboxes and Security attributes – security sandboxes and
permissions/policies.permissions/policies. These guarantees encoded in metadata in These guarantees encoded in metadata in
the units of execution.the units of execution. Any inserted adaptations are bound by the Any inserted adaptations are bound by the
same rules as the original application.same rules as the original application. Un-managed execution environments Un-managed execution environments
don’t give the same guarantees.don’t give the same guarantees.
21
Supported AdaptationsSupported Adaptations
Instrumentation insertion/removal.Instrumentation insertion/removal. Component/structure instance-caching.Component/structure instance-caching. Periodic/on-demand consistency checks Periodic/on-demand consistency checks
on cached components or sub-system on cached components or sub-system interfaces.interfaces.
Hot component swaps.Hot component swaps. Function-input filters.Function-input filters. Residual testing.Residual testing. Ghost Transactions – (POST for software).Ghost Transactions – (POST for software). Selective Emulation (compiled C-Selective Emulation (compiled C-
binaries).binaries).
22
Selective Emulation Using STEM + Selective Emulation Using STEM + DyninstDyninst
STEM – an instruction level x86 STEM – an instruction level x86 emulator developed by another emulator developed by another group at Columbia (Locasto et. al.).group at Columbia (Locasto et. al.).
Dyninst – a toolkit for instrumenting Dyninst – a toolkit for instrumenting running C-applications.running C-applications.
23
How it worksHow it works
Running an application in an Running an application in an emulator/sandbox isn’t a new ideaemulator/sandbox isn’t a new idea Security benefitsSecurity benefits Isolation benefitsIsolation benefits
High overheads associated with High overheads associated with whole-program execution – Valgrind, whole-program execution – Valgrind, Bochs, original STEM.Bochs, original STEM.
Idea: Vary, at runtime, the portions Idea: Vary, at runtime, the portions of the application which run inside of the application which run inside the STEM emulator to manage the the STEM emulator to manage the performance impact.performance impact.
24
Background on STEMBackground on STEM
Original STEM works at the source level:Original STEM works at the source level:
void foo()void foo(){{ int i = 0;int i = 0; // save cpu registers macro// save cpu registers macro emulate_init();emulate_init(); // begin emulation function call// begin emulation function call emulate_begin();emulate_begin(); i = i + 10;i = i + 10; // end emulation function call// end emulation function call emulate_end();emulate_end(); // commit/restore cpu registers macro// commit/restore cpu registers macro emulate_term();emulate_term();}}
25
Using un-modified Dyninst 4.2.1Using un-modified Dyninst 4.2.1
void foo()void foo(){{ int i = 0;int i = 0; // save cpu registers macro// save cpu registers macro emulate_init(); // Oops…can’t inject macros with Dyninstemulate_init(); // Oops…can’t inject macros with Dyninst // begin emulation function call// begin emulation function call emulate_begin(); // OK to inject function calls with emulate_begin(); // OK to inject function calls with
DyninstDyninst i = i + 10;i = i + 10; // end emulation function call// end emulation function call emulate_end(); // OK to inject function calls with Dyninstemulate_end(); // OK to inject function calls with Dyninst // commit/restore cpu registers macro// commit/restore cpu registers macro emulate_term(); // Oops…can’t inject macros with emulate_term(); // Oops…can’t inject macros with
DyninstDyninst}}
26
Modified STEM + DyninstModified STEM + Dyninst
Modify Dyninst trampoline to save CPU state to a memory Modify Dyninst trampoline to save CPU state to a memory address (rather than the stack) before method call.address (rather than the stack) before method call.
Use Dyninst API to allocate memory areas in target process Use Dyninst API to allocate memory areas in target process address space for address space for register storage arearegister storage area and and code storage areacode storage area..
Save instructions relocated by trampoline to prime STEM’s Save instructions relocated by trampoline to prime STEM’s instruction pipeline in the code storage area.instruction pipeline in the code storage area.
Use Dyninst API to insert calls to our RegisterSave and Use Dyninst API to insert calls to our RegisterSave and EmulatorPrime functions which configure STEM.EmulatorPrime functions which configure STEM.
Use Dyninst API to insert calls to STEM’s emulate_begin().Use Dyninst API to insert calls to STEM’s emulate_begin(). Modify STEM to keep track of its stack depth (initially set to 0), Modify STEM to keep track of its stack depth (initially set to 0),
emulation ends when a ret/leave instruction is encountered at emulation ends when a ret/leave instruction is encountered at stack depth 0. The search for emulate_end goes away.stack depth 0. The search for emulate_end goes away.
27
Conclusions – 6 AnswersConclusions – 6 Answers
Kheiron can be used to efficiently and Kheiron can be used to efficiently and transparently retro-fit self-healing mechanisms transparently retro-fit self-healing mechanisms onto existing systems as a form of adaptation. onto existing systems as a form of adaptation.
Kheiron uses facilities and characteristics of the Kheiron uses facilities and characteristics of the unmodified execution environment to adapt unmodified execution environment to adapt running programs.running programs.
Changes can be done/un-done at runtime to Changes can be done/un-done at runtime to manage the performance impact as well as give manage the performance impact as well as give flexibility in evolving the system.flexibility in evolving the system.
Based on metadata, and its verification/validation Based on metadata, and its verification/validation rules, we can extend existing systems in a rules, we can extend existing systems in a general way.general way.
Guarantees on application properties are a Guarantees on application properties are a function of the execution environment.function of the execution environment.
Kheiron supports a wide range of adaptations.Kheiron supports a wide range of adaptations.
28
Future WorkFuture Work Kheiron can be used for disturbance/fault injection.Kheiron can be used for disturbance/fault injection. Working on a methodology for benchmarking self-healing
systems with respect to the efficacy of their self-healing mechanisms (extensions to work done by Aaron Brown et. al.).
Actively looking for systems to field-test/refine/reject ideas about our proposed benchmarking methodology for my thesis.
29
Questions, Comments, Queries?Questions, Comments, Queries?
Thank you for your time and attention.Thank you for your time and attention.
Contact:Contact:
Rean GriffithRean Griffith
[email protected]@cs.columbia.edu