30
Memory Faults: Injection & Solutions Jeffrey Freschl, Di Xue

Memory Faults: Injection & Solutions

Embed Size (px)

DESCRIPTION

Memory Faults: Injection & Solutions. Jeffrey Freschl, Di Xue. The Problem. “Memory meets corruption, it happens everyday, it could happen to you…” --famous quote modified from the People Store Commercial Can Linux handle cheap memory? Can we protect ourselves from memory faults?. - PowerPoint PPT Presentation

Citation preview

Memory Faults: Injection & Solutions

Jeffrey Freschl, Di Xue

The Problem

“Memory meets corruption, it happens everyday, it could happen to you…”

--famous quote modified from the People Store Commercial

Can Linux handle cheap memory?

Can we protect ourselves from memory faults?

Talk Outline

Some Preparation (The How) Actual Corruption and Results A Solution (Methods and Implementation)

Part I – Some Preparation (The How)

Hardware vs Software Fault Injection

Software Fault Injection

SWIFI – Software implemented fault injection is a common way to validate system design.

SWIFI gives the freedom we need.

Fault Injection Process

What We Inject? Task_struct

Process – An instance of a program in execution. Kernel must know process’s state to properly

manage. Task_struct contains information about a process.

Data Members

prio: process’s priority run_list: address of entry in runqueue which contains

list of TASK_RUNNING processes. time_slice: amount of time to run lock_depth: locking for simultaneous access. policy: fifo, round robin, etc. mmap_base: below the stack's low limit (the base) vm_start: start address of the VM area

Part II – Finally, Lets Start Corrupting!

Good, Lets begin the Stress! (Workloads)

Results for Simple Program

Running Blast

Fault Propagation

• EIP locates fault point• Call Trace illustrates path to fault

Part III – A Solution

Protecting Linux from Di’s Corruption

Methods (Update & Access)

Error Correcting Codes (ECC) Majority Vote

What are the tradeoffs? Time?Space?Recoverability?

Intro to Hamming Code (Magic)

Hamming Ruled + p + 1 ≤ 2p (d is # of input bits,

p is # of parity bits)

Generator Matrix G G = [I:A]

A is a (d X p) dim matrix

A must have unique rows and columns

Hamming cont. (More Magic)

To encode input string codeword = input x G

To check if input string is corrupt

H = [AT : I ]

syndrome = H * codeword

if( syndrome == 0 ) then no corruption

otherwise, match syndrome to column in H

Hamming (Back to Reality)

Redundancy Can only recover from 1 bit corruption

Space Almost constant (optimal # of parity bits)

Time Lots of bitwise XORs and ANDs

Majority Vote

Time to update very fast! Space Overhead! Simple Implementation!!

If( copy1 != copy2 ) use copy3

else everything is ok

Part IV Implementation

Design Goals

Want a “redundancy repository” for entire kernel

Minimize Programmer’s Pain!

On demand backup Scalability

“Just give me a location and I’ll take care of you!”

- Redundancy Repository

Redundancy Repository

char parity

long id

Redundancy HashTable Member Entry

int size

How to Protect? Redundancy API

checkParity( addressOfMember, size ) Add before a read access

updateParity( addressOfMember, addressOfNewValue, size )

Add before an update

Some Challenges

Dealing with different sized data members. Originally focused on protecting address Solution: Need to know size of data

What about recursive redundancy? User Registration

Manual Integration

Updated Results

Di + Kernel + Solution Harmony

Summary

20% of the critical data members we tested caused a crash.

Finding every location that updates memory is difficult.

The system no longer crashed with our redundancy solution.