View
32
Download
0
Category
Preview:
DESCRIPTION
Memory Faults: Injection & Solutions. Jeffrey Freschl, Di Xue. The Problem. “Memory meets corruption, it happens everyday, it could happen to you…” --famous quote modified from the People Store Commercial Can Linux handle cheap memory? Can we protect ourselves from memory faults?. - PowerPoint PPT Presentation
Citation preview
The Problem
“Memory meets corruption, it happens everyday, it could happen to you…”
--famous quote modified from the People Store Commercial
Can Linux handle cheap memory?
Can we protect ourselves from memory faults?
Talk Outline
Some Preparation (The How) Actual Corruption and Results A Solution (Methods and Implementation)
Software Fault Injection
SWIFI – Software implemented fault injection is a common way to validate system design.
SWIFI gives the freedom we need.
What We Inject? Task_struct
Process – An instance of a program in execution. Kernel must know process’s state to properly
manage. Task_struct contains information about a process.
Data Members
prio: process’s priority run_list: address of entry in runqueue which contains
list of TASK_RUNNING processes. time_slice: amount of time to run lock_depth: locking for simultaneous access. policy: fifo, round robin, etc. mmap_base: below the stack's low limit (the base) vm_start: start address of the VM area
Methods (Update & Access)
Error Correcting Codes (ECC) Majority Vote
What are the tradeoffs? Time?Space?Recoverability?
Intro to Hamming Code (Magic)
Hamming Ruled + p + 1 ≤ 2p (d is # of input bits,
p is # of parity bits)
Generator Matrix G G = [I:A]
A is a (d X p) dim matrix
A must have unique rows and columns
Hamming cont. (More Magic)
To encode input string codeword = input x G
To check if input string is corrupt
H = [AT : I ]
syndrome = H * codeword
if( syndrome == 0 ) then no corruption
otherwise, match syndrome to column in H
Hamming (Back to Reality)
Redundancy Can only recover from 1 bit corruption
Space Almost constant (optimal # of parity bits)
Time Lots of bitwise XORs and ANDs
Majority Vote
Time to update very fast! Space Overhead! Simple Implementation!!
If( copy1 != copy2 ) use copy3
else everything is ok
Design Goals
Want a “redundancy repository” for entire kernel
Minimize Programmer’s Pain!
On demand backup Scalability
How to Protect? Redundancy API
checkParity( addressOfMember, size ) Add before a read access
updateParity( addressOfMember, addressOfNewValue, size )
Add before an update
Some Challenges
Dealing with different sized data members. Originally focused on protecting address Solution: Need to know size of data
What about recursive redundancy? User Registration
Manual Integration
Summary
20% of the critical data members we tested caused a crash.
Finding every location that updates memory is difficult.
The system no longer crashed with our redundancy solution.
Thank You
Jeffrey Freschl jfreschl@cs.wisc.edu Di Xue goldenspaceship@gmail.com
Recommended