Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting

Automated Vulnerability Analysis: Leveraging Control Flow

for Evolutionary Input Crafting

Sherri Sparks, Shawn Embleton, Ryan Cunningham, and Cliff Zou

School of Electrical Engineering and Computer ScienceUniversity of Central Florida

December, 2007

ACSAC

Vulnerability Analysis

Involves discovering a subset of a program input space with which a malicious user can exploit logic errors to drive it into an insecure state

Complexity of modern software makes complete program state space exploration an intractable problem

Motivation Oftentimes, security researchers/hackers have analyzed

and located a potential vulnerable location in a system (software/hardware) C programs have well-known potentially vulnerable API functions

(e.g., strcpy()). A critical hardware component dealing with user inputs

Exploitability implies reachability In order to determine if a potential vulnerability is exploitable one

must prove that …1. It is reachable on the runtime execution path2. It is dependent / influenceable by user supplied input

Testing: Intelligent input generation to improve code coverage

An Input Crafting Problem

What does the input have to look like to exercise the code path between input node (recv) & the potentially vulnerable node (strcpy) ?

recv

strcpy

Parsing & validationlogic on path between recv and strcpy

Control Flow Graph (CFG)

Testing: intelligently generate inputs that can reach a code region for intense testing

TFTP Control Flow Graph

Basic Idea of Our Approach

Some inputs are better than others: They increase coverage by reaching previously unexplored

areas of the CFG They are on a path to a basic block where some potentially

vulnerable APIs are being used Find new improved inputs by Genetic Algorithm (GA)

“Mate” the best of previous inputs we’ve found in the past to generate new generation of inputs

Propose “Dynamic Markov Model” for input measurement

Apply “Grammatical Evolution” to shrink input search space

Short Review ― Genetic Algorithms

A stochastic optimization algorithm that mimics evolution

Requires two things A representation

What should a solution look like Binary string, ASCII string, integer…

A fitness function Tells how good or bad each a solution is

Short Review ― Genetic Algorithms

It works like this:1. Start out with a population (set) of

random solutions2. Find each solution’s fitness3. Select solutions with high fitness values4. Generate new solutions through

mutations and crossover on selected solutions

5. GOTO 2 (the next generation)

Grammatical Evolution in Generating Inputs

Efficiently reduce search space Flexible in utilizing partial-known knowledge of

inputs (user-specified context-free grammar) Not used in any previous approaches

S sAs | xBx | m

A bBb | B

B aAa | C | AB

C c | d | e

S xBx xaAax xabBbax xabCbax xabdbax

0 1 2

10011

Fitness Function ― Dynamic Markov Model

Treat the control flow graph as a Markov Chain

The probability on each conditional transition edge is updated along the searching based on previously tested inputs

Edge transition probability is calculated by:

A

B C

D E F

K

N

JIH

M

G

L

.25 .75

.9 .1

.5

.5

1

.67 .33 .6.4

11

1.2 .8

Control Flow Graph (CFG)# of inputs traversed the edge

# of inputs reached the conditional block

Fitness of An Input Fitness of an input: inverse of the product of transition

probabilities of all edges along the execution path

Larger fitness is better Explore unobserved states Explore rarely observed states Increase coverage

A

B C

D E F

K

N

JIH

M

G

L

.25 .75

.9 .1

.5

.5

1

.67 .33 .6.4

11

1.2 .8

Execution Path = A, C, E, D, G, M

Fitness = 1/(.75 x .9 x .5 x .67 x .8) = 5.525

Better than previous methods• Explore less observed state• Utilize information of all previously searched paths

Prototype ― An Intelligent Fuzz Testing Tool (1)

Fuzzers – Black box analysis tools that inject random generated inputs into a program and then monitor it for crashes

Pros: Simple, automated, test unthinkable inputs

Cons: non-intelligent, hard to achieve good code coverage

Prototype ― An Intelligent Fuzz Testing Tool(2) We seek to provide the following desirable qualities

(many existing tools lack one or more) Intelligence

The ability to learn something useful from the inputs that have been tried in the past and use that knowledge to guide the selection of future inputs.

Targeted Code Coverage The ability to focus testing upon selective regions of interest in the code.

Targeted Execution Control The ability to drive program execution through parse code to “drill down” to a

specific node in the control flow graph (which is suspected to contain a vulnerability)

Source Code Independence Ability to work on compiled binaries without source code availability

Extensibility and Configurability The ability to fuzz multiple protocols with a single tool

Prototype ― An Intelligent Fuzz Testing Tool(3)

Implementation: Use PAIMEI framework to build a prototype fuzz testing

tool PAIMEI is a reverse engineering framework Written in Python scripting language Has been used by security community to build various

fuzzing, code coverage, and data flow tracking tools Use IDA Pro plugin SDK to construct control flow

graph Have successfully tested on TFTP binary program

System Overview Extract program control flow graph (CFG) Extract focusing subgraph (source, destination) Set breakpoints and register breakpoint handlers Initialize the set of random inputs Inject inputs one by one

Record an input’s execution path via breakpoint handlers Update dynamic Markov model parameters of CFG Calculate fitness

Select a fraction of best inputs Build the new set of inputs via mutation and crossover

Evaluation Target Application

We used the tftpd.exe Windows server program for our initial experiments and validation of our approach

GA Parameters Mutation Rate = 90% Crossover Rate = 75% Elitism Selective Breeding Dynamic Mutation

Context Free Grammar Hex bytes 0-255 Strings “netascii”, “octet”, and “mail”

TFTP Control Flow Graph

Experiment # 1: Targeted Execution Control

Tested the ability of GA fuzzer to drive execution through parse logic to 2 embedded, vulnerable strcpy() functions.

Compared against fuzzing with random input 1st strcpy() reached in:

GA: 224 generations Random: 2294 generations

2nd strcpy() reached in: GA: 224 generations Random: 9106 generations

GA vs. Random Search

Comparison between GA driven and random search of tftp packet parsing logic. The node address corresponds to basic block virtual addresses on paths from the beginning to the end of the packet parsing logic.

Comparison of the standard deviations of the # of generations (in 50 runs) between the GA driven and random search of tftp packet parsing logic.

Fuzzing ran around 1 hour for 10,000 generations (may still not reach target node), while our approach ran around 10 minutes to reach target node

Experiment # 2Code Coverage Selectivity

Tested the ability of our GA to achieve code coverage of the tftp parser logic

Compared against random input selection Better code coverage

Average over 3000 generations GA: 84.81% coverage Random: 49.54% coverage

Random approach: running for an additional 7000 generations only increased its coverage to 54.51%

Achieves deeper code coverage quicker Able to leverage what it has learned from past inputs!

Experiment # 3CFG Penetration Depth

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Depth

Tim

e (

ge

ne

rati

on

s)

GA Search Random Search

Experiment #4: Learning Input Formats

Programs assume that input will comply with published standards As a result, protocol parsing bugs abound!!!

We test the ability of our prototype to explore the boundaries of the TFTP packet parsing logic by attempting to have it learn a valid packet format We set the destination node as the basic block

corresponding to an accepted packet

Evolving A TFTP Packet

Major Contributions

Practical implementation Finished initial prototype Analysis on binary code

Novelty in methodology Dynamic Markov model as fitness Grammatical evolution for input generation

Security focused Previous related work focuses on software testing

Targeted code coverage Efficiently test mission-critical or susceptible parts

Advantages of Our Approach We apply knowledge gained from past experience to drive our

choice for future inputs Well suited to applying to parser code, which has a

rich control flow structure for the GA to learn from Maximizes code coverage within specific portions of a program

graph Minimal knowledge of input structure required

GA can learn to approximate input format during execution

Once a target location has been reached, the algorithm continues to exploit weakensses in the CFG to produce additional, different inputs capable of reaching it

Limitations Difficulty to extract some parts of the CFG statically

Thread Creation Call tables

Dependent upon Control Flow Graph structure Program must have enough information embedded within its

structure for the GA to be able to “learn from” Assumes dependency between graph structure and user supplied

input (an example would be parser code) Not useful for programs that have a ‘flat’ CFG structure Finding all paths has high complexity O() and takes a long

time on large program graphs We can prove reachability by getting to a potentially vulnerable

target state, but failure to get there does not mean the location is unreachable!

Conclusions Shows how genetic algorithms can be applied to the external

input crafting process to maximize exploration of program state space and intelligently drive a program into potential vulnerable states.

Automated approach treats the internal structure of each node in the CFG as a black box.

Needs testing on more complex programs Our work is theoretical and prototypish

Documents

Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting