Lecture 1 Page 1 CS 236, Spring 2008 Information Flow Tracking CS 236 Advanced Computer Security Peter Reiher April 8, 2008

Lecture 1Page 1CS 236, Spring 2008

Information Flow TrackingCS 236

Advanced Computer Security Peter ReiherApril 8, 2008


Groups for This Week

1. Kenneth Arthur, Vishwa Goudar, Chieh-Ning Lien

2. Golita Benoodi, Michael Hall, Jason Liu

3. Darrell Carbajal, Seongwan Han, Hootan Nikbakht

4. Andrew Castner, Zhen Huang, Ioannis Pefkianakis

5. Chia-Wei Chang, Abishek Jain, Peter Peterson

6. Chien-Chia Chen, Jihyoung Kim, Adam Stoelting

7. Dae-Ki Cho, Joseph Kulisics, Min-Hsieh Tsai

8. Michael Cohen, Nikolay Laptev, Peter Wu

9. Jih Chung Fan, Chen-Kuei Lee, Faraz Zahabian


Outline

• The problem

• Different approaches

• Applying ideas to one system


Information Flow Tracking

• Security policies shouldn’t really be applied to files

• They should be applied to the information in the files

• Which implies policies should travel with data

– As it goes from file to file


For Example,The entire process is

tainted

So everything it writes is also

tainted


Why Is This a Problem?

• Eventually, most restrictive policy applied to almost everything

• Most of it doesn’t actually need the policy

• The simplistic method of tracking information flow requires conservatism


Was This Necessary?

• Probably not

• In most cases, the data written was not sensitive

– And thus didn’t need to have the sensitive policy applied

• But what else could you do?


Core Idea• Follow information flow on a finer granularity

• A process that reads tainted data need not always write tainted data

• Somehow figure out which writes need this policy to be applied

–Using the finer granularity of tracking


What’s Really Going On

Only important to track

where the tainted

data goes

Not where untainted data goes


Possible Granularities• Processes

– Too big and expensive• Threads

– Typically share memory• Sub-process units

– Not available in most systems• Memory pages or addresses

– How?


Approaches

• Language approaches

• Asbestos and HiStar

• Rifle


Language Approaches

• Use data typing to propagate security labels through program

• Jif is one example language that does this

• Language can prohibit illegal flows

• Or simply ensure labels are propagated properly


Advantages and Disadvantages

+Most costs paid at compile time

+Other forms of source code analysis possible

−Only works for programs written in that language

−Relies on correctness of compiler

−Hard to prove binary is correct


Asbestos Approach• Use a sub-process abstraction

• The event process

– Process state belonging to a single user (user defined broadly)

• One process contains multiple event processes

• Process isolates each event process’ state from the others


Conceptually,


How It Really Works

• Base processes hold multiple event processes• Each event process has its own memory• Kernel schedules base process

– But transfers control and limits memory on event process basis

– Essentially, copy-on-write– Efficiency requires special programming


Real Point of Asbestos

• Not so much tracking of data

– Though that’s necessary

• As enforcing that tainted data doesn’t go where it shouldn’t

• With such enforcement, can provide mandatory access control policies


Practicalities of Asbestos

• Requires rewriting applications

– To use event process paradigm

• Built as a proof-of-concept system

• Not suitable for real-world use

– But applications doing real things demonstrated on Asbestos


HiStar• Built using Asbestos data labeling methods

– Preventing information leakage

• Goal is to allow trusted applications with minimum trusted code

• Including minimum trusted kernel code

• Unlike Asbestos, includes Unix library

– But provides no new security


HiStar and Code Wrapping

• HiStar can make old apps more secure– With some new design and a little new

wrapper code• Essentially, wrap existing code with new code

– Which enforces security restrictions• E.g., wrapping OpenVPN

– Required a few hundred lines of code


Key Advantages of HiStar

• Over Asbestos

• Less trusted code

• Ability to wrap untrusted code with new code

– Gaining security advantages

– Thus, easier to work with legacy apps


Rifle

• Asbestos and HiStar required new OS structure

– Limiting what could be run with them

– Getting new security required new applications

• Rifle takes a different approach


The Rifle Approach

• Dynamic code rewriting• Take a standard executable• Attach security labels to data• As executable touches labeled data,

– Rewrite executable to copy labels• OS must still enforce policies• But rewritten code does label tracking


Rifle In More Detail

• Tag memory with security label

– Preferably using new hardware

– But possible without it

• When tagged memory accessed by executable,

– Rewrite it to propagate tag

• If policy prohibits write, don’t do it


Advantages of Rifle Approach

• Works on finer granularity

– Only truly prohibited flows are trapped

• No need to re-write or create new apps


Disadvantages of Rifle Approach

• Probably more subject to covert channels

• Requires special hardware

– More precisely, good performance requires it

– Hardware not currently available

• Probably bigger performance penalty


Rifle’s Special Hardware• Essentially tags for memory and registers

– To store security labels

• Augmented instructions

– To propagate security labels

– And combine multiple labels

– Otherwise, effects like ordinary counterparts


Why Rewrite Code?

• Why not just use special hardware to move labels around?

• Because of implicit data flows

– Flows where the data divulged is not through an actual obvious assignments


An Implicit Data Flow

B = false;

C = false;

if (!A)

C = true;

if (!C)

B = true;

print B;


What’s the Problem?• What if the value of A is a secret?• If you execute this code and A is true, B is printed

as true– And printed as false, if A is false

• Could propagate labels as you do the assignments– Refuse to print B if it was “copied” from A– But you’ll only do that if A is true– Which will leak the information


Solving the Problem

• Dynamically rewrite the code “as needed”

– On the fly, depending on actual execution


An Example1 // Assume R[1] contains a

2 // b will be stored in R[2]

3 // c will be stored in R[3]

4 mov R[2] = 0

5 mov R[3] = 0

6 (R[1]) branch .L1

7 mov R[3] = 1

8 .L1: (R[3]) branch .L2

9 mov R[2] = 1

10 .L2: store [R[5]] = R[2]


The Rewritten Code

1 // Assume R[1] contains a



4 mov R[2] = 0 5 mov R[3] = 0

6 mov S[1] = labelof(R[1])

7 (R[1]) branch .L1

8 <S[1]> mov R[3] = 1

9 .L1: <S[1]> mov S[3] = labelof(R[3])

10 (R[3]) branch .L2

11 <S[3]> mov R[2] = 1

12 .L2: <S[3]> store [R[5]] = R[2]


What’s the Rewriting Done?1 // Assume R[1] contains a



4 mov R[2] = 0

5 mov R[3] = 0

6 mov S[1] = labelof(R[1])

7 (R[1]) branch .L1

8 <S[1]> mov R[3] = 1

9 .L1: <S[1]> mov S[3] = labelof(R[3])

10 (R[3]) branch .L2

11 <S[3]> mov R[2] = 1

12 .L2: <S[3]> store [R[5]] = R[2]

Anything depending on S[1] will have the label of R[1]

Propagates label S[1] to R[3] if branch L1 not taken

Combines labels S[1] and R[3] if branch L1

is takenPropagates labels S[3] to R[2] if branch L2 not

taken Combines labels S[3] to R[2] and R[5] if branch

L2 not taken


The Net Effect

• If A (stored in R[1] in program) is secret,

• Then line 12 will be executed with a secret label

• Regardless of value of A and information flow


What About That Hardware?• Essentially only stores extra information

• Information could be stored in regular memory

• How to propagate it?

• More code re-writing

– To explicitly copy/combine labels

• Later research suggests feasibility

– In both correctness and performance


The Data Tethers ProblemFile A

If the laptop is stolen, file A goes

with it

File A


The Data Tethers Solution

File A

If the laptop is stolen, file A isn’t

there

File A


Basic Data Tethers Operations

• Tie policies to pieces of data

– E.g., “file X cannot leave the office”

• Observe environmental conditions

– E.g., “leaving the office”

• Apply policies to remove files when necessary


What’s That Got To Do With Flow Tracking?

File A File

A

File A’ What if the user

copies the data to another file?

Will the copy be tethered?

Or not . . .?

File A’


Information Flow and Data Tethers

• We want to tether data, not files

• Requires tethers to be propagated as data is copied

• But how to avoid unnecessary tethering?

• Requires sophisticated data flow tracking


Returning to the Example

File A

File A’ A’ is a copy of AFile

B B is unrelated to A

How do we tell which to tether?


So What Do We Do?

• Data Tethers wants to work on commodity operating systems

• Don’t want to reimplement executables

• Don’t have special hardware

• How do we make it work?


Proposed Approach

• Use code re-writing

• Assume few things have tethers attached

– Might limit performance problems

• Rewrite code to propagate labels in reserved memory area


Effect on Our Example

• Program that copies A to A’ propagates the label to A’

• Program that copies B from untethered data C doesn’t label B

• Underlying OS structures deal with applying policies, as required

– Today, we only care about labeling


A Different Kind of Code Rewriting

Before After


Issues to Be Addressed

• Are you crazy?• Granularity of labeling• Granularity of rewriting• When does rewriting occur?• How many labels?• How to combine them?• Would we be better off with HiStar?

Documents

Lecture 1 Page 1 CS 236, Spring 2008 Information Flow Tracking CS 236 Advanced Computer Security Peter Reiher April 8, 2008