Cybersecurity AI [TensorFuzz] Debugging Neural Networks with Coverage-Guided Fuzzing · 2020-05-11 · [TensorFuzz] Debugging Neural Networks with Coverage-Guided Fuzzing Authors:

[TensorFuzz] Debugging Neural Networks with Coverage-Guided Fuzzing

Authors: Augustus Odena, Ian Goodfellow

Presentor: Tahseen ShababFacilitators: Susan Shu, Serena McDonnell

Date: 26th August, 2019

Cybersecurity AI

Tahseen ShababPresenter

CEO, Bibu Labs

Susan Shu Serena McDonnellFacilitator

Data Scientist, Bell

Facilitator

Senior Data Scientist, Delphia

Speakers

We

Prof Hassan KhanChief Scientist, Bibu Labs

Prof. Kate Larson Prof. Larry SmithAdvisor - AI, Bibu Labs Advisor - Strategy, Bibu Labs

We Are Growing!

Feb, 2019

$1.4 B Acquisition

July, 2019

Cylance Hack: Enable Dynamic Debugging

Cylance Antivirus

Verbose Logging

Score: {

-1000: Most Malicious+1000: Most Benign

}

Dynamic Debugging Enabled

Cylance Hack: Reverse Engineer Model

7000 Feature Vectors Neural Network

Post ProcessingAdded Filter

White/Black List

Cylance Hack: Exploit Model Bias

● Researchers found bias in the model○ A small set of features that have significant effect on outcome

● “Added Filter” uses Clusters with specific names to Whitelist files,

one being a famous game

● Researchers added strings from games executable to real

malicious file

● Game Over!

Have We Seen This Before?

Lawd & Meek (2005) and Wittel & WU (2004)

● Attacks against statistical spam filters

○ Add good words

○ Words the filter consider indicative of non-spam to spam

● Append words which appear often in ham emails and rarely in spam

to a spam email

● Spam Filter Fooled!

Why Are These Hard To Spot?

● Traditional Software○ Devs directly specify logic of

system

● ML System○ NN learns rules automatically○ Developers can indirectly modify

decision logic by manipulating■ Training data■ Feature selection■ Models architecture

○ NN’s underlying rules are mostly unknown to developers!https://arxiv.org/pdf/1705.06640.pdf

Source of Blind Spots

https://arxiv.org/pdf/1705.06640.pdf

Adversarial Attacks

Adaptive Nature of Hackers

● Hackers Take Path of Least Resistance ● If a Patch is deployed, Hackers will take the path of least resistance

Vulnerability 1

Vulnerability 2

Vulnerability 3

Data Distribution Actively Manipulated

● Hackers strategically insert

attack data

● Model trains periodically

● Decision boundary is altered

Data Poisoning

secml.github.io

● Add Noise

● Classifier Misclassifies Object

● Model learns differently than

humans

Attack: Induce Specific Output

“Explaining and Harnessing Adversarial Examples”, Ian Goodfellow

Submit queries, observe response

● Training Data

● Architecture

● Optimization Procedures

Attack: Expose Model Attributes

"Towards Reverse Engineering Black Box Neural Networks”, Seong Oh

Taxonomy of Attacks Against ML Systems

Axis Attack Properties

Influence Causative - influences training

and test data

Exploratory - Influences test data

Security Violation

Confidentiality - goal is to uncover

training data

Integrity - goal is false negatives

(FNs)

Availability - goal is false positives (FPs)

Specificity Targeted - influence prediction of

particular test instances

Indiscriminate - influence predictions of all test instances

Adversarial Machine Learning - Joseph, Nelson, Rubinstein and Tygar, 2019

Exploratory Attacks Against Trained Classifier

● Attacker doesn’t have access to training data

● Most known detection techniques are susceptible to blind spots

● How difficult is it for adversary to discover blind spots that is most

advantageous to them?

How Can We Find these Blind Spots?

https://www.theemotionmachine.com/listen-to-family-and-friends-how-to-protect-yourself-from-blind-spots/



● Check erroneous corner cases● Input: Unlabeled test input● Objective: Generate test data

to:○ Activate large number of neurons○ Force DNNs to behave differently

● Joint Optimization Problem: Maximize

○ Differential behaviour○ Neuron coverage

DeepXplore: White Box Testing

● Perform gradient guided local search

○ Starting: seed input○ Find new inputs that maximize

desired goal

● Similar to backpropagation, but:

○ Inputs: Variable○ Weights: Constant

DeepXplore: Example

● Bayesian Neural Network● Adding dropout before every

weight layer approximation of gaussian process

○ Both training and test

● Dropout during test○ Different output for same input

■ [4,5,1,2,3,6]○ Equivalent to MC sampling○ High Variance = High uncertainty

Bayesian NN: Modelling Uncertainty

https://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html



TensorFuzz

TensorFuzz

● Open Source Tool● Discovers errors which occur only for rare inputs (Blind Spots)● Key Techniques:

○ Coverage Guided Fuzzing○ Property Based Testing○ Approximate Nearest Neighbor

TensorFuzz



● Instrument Program for coverage

○ Add instructions to code allowing fuzzer to detect code paths

● Feed Random Inputs into program

● Continue to mutate inputs that exercised new part of the program

○ Genetic Algorithm

● Identify bugs

Coverage Guided Fuzzing (AFL)

● Aids the discovery of subtle fault conditions in the underlying code

● Security vulnerabilities are often associated with unexpected or incorrect state transitions

AFL: Branch Edge Coverage

AFL Documentation

● Identifies potentially interesting control flow changes, ○ Ex. A block of code being

executed twice when it was normally hit only once

AFL Documentation

AFL: Hit Count

● Sequential bit flips with varying lengths and stepovers,

● Sequential addition and subtraction of small integers,

● Sequential insertion of known interesting integers (0, 1, INT_MAX, etc)

AFL: Mutation Strategy

TensorFuzz



● Verifies a function or program

abides by a property

● Properties check for useful

characteristics that must be seen

in output

Property Based Testing

https://medium.com/criteo-labs/introduction-to-property-based-testing-f5236229d237



● Cover the scope of all possible inputs○ Does not restrict the generated

inputs ● Shrink the input in case of failure

○ On failure, the framework tries to reduce the input to a smaller input

● Reproducible and replayable○ Each time it runs a property test,

a seed is produced in order to be able to re-run the test again on the same datasets

Advantage




TensorFuzz



Approximate Nearest Neighbor

http://web.stanford.edu/class/cs369g/files/lectures/lec16.pdf

● Nearest Neighbor○ Given points p1,p2,...,pn, and

query point q, find closest point to q among p1,...,pn

● Approximate Nearest Neighbor○ Condition is relaxed○ Fin pi so that

■ d(q,pi) <=c.min d(q,pj)



TensorFuzz



Sadly, CGF Tools Don’t Work For Neural Networks

● Coverage Metrics○ Lines of Code Executed○ Which branches have been taken

Traditional Software Workflow



● Software implementation may contain many branching statements

○ Based on architecture○ Mostly independent of input

● Different inputs will often execute ○ same lines of code ○ same branches,

● But will produce interesting variations in behaviour

Neural Network Workflow



How Does TensorFuzz Work?

Let's Dive In!

Dio, Holy Diver

TensorFuzz1. We interact with a

TensorFlow Graph instead of instrumented Computer Program

2. Valid neural network inputs are fed instead of big array of bytes.

Ex. For, if inputs are sequences of character, only allow characters that are in vocabulary extracted from the training set

TensorFuzz

3. Input Chooser intelligently chooses elements from input corpus.

Following heuristics is used:

: Probability of choosing corpus element ck at time t

tk: Time when ck was added to the corpus

Intuition: Recently sampled inputs are more likely to yield useful new coverage when mutated, but advantage decays over time.

TensorFuzz

4. Mutator modifies input in a controlled manner

For text input, mutation occurs in accordance to following policy:

Uniformly at random perform one of following operations:

- Delete, Add, Subtract - Random character at

random location

TensorFuzz

Diving Deeper

5. Mutated inputs are fed to Neural Network. The following are extracted from NN

- Set of coverage arrays- Enables computation

of coverage- Set of metadata arrays

- Fed as input to objective function

5.a Objective Function

- Desired Outcome- Ex. Error, crash

Outputted Metadata arrays is fed into Objective function, and inputs causing system to reach goal of objective function are flagged

TensorFuzz

5.b Coverage Analyzer

Core part of product

Reading arrays from TensorFlow runtime, turning them into python objects representing coverage, checking whether that coverage is new

TensorFuzz

Desired Properties of Coverage Analyzer

● Check if Neural Network is in new state○ Enables detection of misbehaviour

● Check has to be fast● Should work with many different computation graphs

○ Remove Manual Intervention as much as possible● Exercising all of the coverage should be hard

○ Or else we won’t cover much of possible behaviours

Use Fast Approximate Nearest Neighbour

● Determine if two sets of NN activations are meaningfully different from each other

● Provides a coverage metric producing useful results for neural network○ Even if underlying software implementation of the neural network does not make

use of many data-dependent branches

Intuition: Coverage Analyzer

Activation

Activation

Activation

ActivationCurrent Input

Old Input

Delta DeltaDelta

New Coverage Reached If Distance Sufficiently Large

● On New Activation Vectora. Use Approximate nearest

neighbors Algorithmb. Look up nearest neighbourc. Check distance between

current and nearest neighbour in Euclidean distance

d. Add input to corpus if distance is greater than Lhttps://medium.com/@erikhallstrm/backpropa

gation-from-the-beginning-77356edf427d

Coverage Analyzer: Details

https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d


● Note: Often, good results are achieved only by looking at logits or layer before logits


Coverage Analyzer: Details



6. Mutated input is:

- Add to corpus if- New coverage is achieved

- Added to list of test cases if- Objective function is satisfied

TensorFuzz

Break

https://www.bandt.com.au/media/facebook-manipulated-users-feeds-experiment

Experiments

https://www.bandt.com.au/media/facebook-manipulated-users-feeds-experiment

Experiment: Finding NaNs

● NaNs consistently cause trouble for researchers and practitioners, but

they are hard to track them down

● A bad loss function is “fault injected” into a neural network

● TesnorFuzz could find NaNs substantially faster than a baseline

random search

● Left: Coverage overtime for 10 different random restarts

● Right: An example of a random image that causes neural network to NaN

Experiment: Finding NaNs

Experiment: Quantization Errors

● We often want to quantize neural networks● How to test for accuracy? ● We can look at differences in test sets, but often few show up● Instead, we can fuzz for inputs that surface differences

● Left: Coverage overtime for 10 different random restarts. Note that 3 runs fail

● Right: An example of an image correctly classified by the original neural network but incorrectly classified by the quantized network

Experiment: Quantization Errors

Discussion

Discussion Points

● How do we embed security testing into the ML Solution development lifecycle?

● Can explainable inference help to detect blind spots?● Can we use multiple classifiers in parallel to reduce the implications of an

attack on a specific model?

Documents

Cybersecurity AI [TensorFuzz] Debugging Neural Networks with Coverage-Guided Fuzzing · 2020-05-11 · [TensorFuzz] Debugging Neural Networks with Coverage-Guided Fuzzing Authors: