RAS Round-up From the IBM Linux Technology Centre
IBM Linux Technology Centre
Richard J. Moore C.Eng, MIEE, MIEEE, BSc
14th June 2002 (v8)
UKUUG 2002Bristol
5th July 12:40
Topics1. Dynamic Probes2. Kernel Hooks (GKHI)3. Linux Event Logging for the Enterprise4. Flexible Dump5. System Trace6. Community Adoption7. Miscellaneous8. What's Next9. The Team - Contacts
Dynamic Probes (DProbes)
1.1 Dynamic Probes - What is it?
Low-level Universal DebuggerOperates in extreme conditionsKernel/User, Interrupt/Task, Code/DataFor Service/Support Engineers on Production SystemsMonitors Low-level System ResourcesDynamic & Fully AutomatedTrigger/Enabler for:KDB, LKCD,LTT,evlog, Core Dump, Syslog, etc
1.2 Dynamic Probes - Where Used?
1.2 Dynamic Probes - Where Used?
Service/Support Engineer's FacilityLive SystemsNon-recreatable ProblemsNo source modification requiredTiming Sensitive Problems
Developer's ToolAlternative to temporary printk/printfApplication, Driver, Kernel etc.Timing Sensitive Problems
Test ToolFault Injection
1.3 Dynamic Probes - Mechanics
1.3 Dynamic Probes - Mechanics
Global Breakpoint ProbesIn-core code modificationTrack by Inode-OffsetAvoid COW page privatization using physical addressUnlimited Concurrent Probes
Global Watchpoint ProbesUses Debug RegistersTrack by Virtual Address
Pre-probe Script Driven Probe HandlerRPN assembler language interpreterHLL C-like Compiler
2 Kernel Hooks (GKHI)
Kernel Hooks (GKHI)
112
2
3
4
567
89
1011
2.1 Kernel Hooks - What are they?
Code locations where added function may be inserted
Supplement or replace standard function - subclassing
Function may not be known at build or run time
Function may load later therefore simple call cannot be used
Kernel has a particular need to implement hooks
Used by DProbes
2.2 Why not Patch?
2.2 Why not Patch?
InconvenientMultiple patches may require manual rework
InflexibleCannot select additional functions at run-time
Code BloatAdditional functions always presentObscures the prime function
2.3 Basic Requirements
2.3 Basic Requirements
Multiple hooks to co-exist within a module
Shared use of a hook by multiple exits
Sole use of a hook by a specific exit
Minimal impact to performance when a hook is unused
Exit must be able to operate as if inserted:Have access to local variablesTerminate the function
Group of exits able to insert atomically
Need a Managed Interface2.4 The Managed Interface
2.4 The Managed Interface
For Hooked Code:A HOOK macro - indicate the hook locationhook_initialise - allows use of the hookhook_terminate - disallows use of the hook
For Hook Exits:hook_register - identifies exit routine and priorityhook_arm - activates group of exitshook_disarm - deactivate group of exitshook_deregister - removes exit from interface
3 Linux Event Logging
Linux Event Loggingfor the
Enterprise(evlog)
3.1 evlog - What is it?
Comprehensive Logging CapabilityComplies with draft POSIX SRASS standardPOSIX APIsStructured Event RecordsOptionally Captures Syslog and Klog messagesLogs Binary and Text MessagesUser and Kernel SpaceTask, Init & Interrupt TimeEvent Notification - Automation, System ManagementEvent FilteringLog ManagementAfter-the-fact Formatting
3.2 evlog - Where Used
112
2
3
4
567
89
1011
3.2 evlog - Where Used
Device Driver HardeningAutomated RecoveryOn-line Diagnostic ActionSystem Management
Instrumentation SchemesWrapper macrosEase of Implementation
4 Flexible Dump
Flexible Dump
4.1 Flexible Dump - What is it?
Goals for a Comprehensive System Dump Non-disruptive - Snapshot CapableSystem and (multiple) User Context VisibilityMinimal System DependenceStand-alone CapableCustomisable Dumping - Virtual & Physical Memory Ranges, Objects, Processor Resources etc.Multiple triggers: Exception Kernel/User, API, NMI/KBD InterruptAccess to Swapped DataDump Space/Repository ManagementProgrammable formatterSMP CapableSupport for Alternative Dump Devices (Telco)
4.2 Flexible Dump - Where is it?
4.2 Flexible Dump - Where is it?
Contributions to LKCDSnapshot Dump - DProbes interfaceNon-disruptiveCustom Dump ObjectsMinimal System DependenceSMP fixes + multiple CPU status saving
Working with LKCD Community
5 System Trace
System Trace
5.1 System Trace - What is it?
Generic Trace Recording Mechanism
Community contributions to:Linux Trace Toolkit (Opersys)Dynamic Trace - DProbes interfaceFormatting exit for RAW trace data
Supporting Similar efforts in:Linux Kernel State Trace (LKST) - Hitachi
5.2 Tracing Initiatives
112
2
3
4
567
89
1011
5.2 Tracing InitiativesBuffering:
Per CPUPer ComponentZero LockingVariable Length
ControlSuspend/ResumeGlobal Activate/Deactivate
InstrumentationKernelDriversUser-space SubsystemsFine-grained Dynamic Trace
Formatting
6 Community Adoption
112
2
3
4
567
89
1011
Community Adoption
6.1 Adoption Initiatives
Establishing a RAS Community - OLS RAS BoFs
Minimise Fragmentation - Maximise Contribution
Canvassing Distributors
POSIX
Instrumentation - standards, aids, implementation
Porting & Currency
Preparation for 2.5 Kernel
7 Miscellaneous
7 Miscellaneous
KDBComplex Breakpoints - DProbes InterfaceTwo Patches Accepted
KernelDebug Register Allocation Patch (Dprobes/KDB/gdb)
8 What's Next?
8 What's Next?
Log/Trace Instrumentation of Kernel and Device DriversWe need participation from the Community
DProbes ports for IA64 and zSeriesTurbo Linux release of RAS UtilitiesSampler Probe type for ProfilingDProbes HLL CompilerDump User ContextsKDB User ContextsMission Critical mcore Integration with LKCDOn-line Diagnostics FrameworkFirst Failure System TechnologyPerformance Co-PilotRAS Community BOF at OLS 9 The Team - Contacts
9 The Team - Contacts
End of Presentation
India:
Suparna BhattacharyaS. VamsikrishnaSubodh SoniBharata B. Rao
USA:
Larry KesslerJames KenistonHaren MyneniHien Q NguyenMike SullivanMichael MasonThomas ZanussiDaniel StekloffDavid OleszkiewiczTom Hanrahan (Manager)
Mailing List: [email protected] Page: http://systemras.sourceforge.net/Richard J Moore: [email protected]
UK:
Richard J Moore (Architect)
Trademarks
Trademarks
IBM, zSeries and S/390 are trademarks of the International Business Machines Corporation in the United States and other countries.
IA32 and IA64 are abbreviations Pentium 32-bit and Itanium 64-bit architectures of the Intel Corporation.