Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Verifiably Robust Machine Learning For Security
Yizheng ChenPostdoctoral Researcher
Columbia University
* Joint work with Shiqi Wang, Dongdong She and Suman Jana
Security Classifiers
3
Very easy to evade7MB PDF evades Gmail’s PDF malware classifier [1]
[1] Xu et al., NDSS Talk: Automatically Evading Classifiers (including Gmail’s).
Benign
Robust Accuracy
oUnbounded attackers can always evade the classifier
4
All Samples
Robust and
Accurate
Accurate
Robust Accuracy
oUnbounded attackers can always evade the classifier
oReasonably bounded attackers
oGeneralize to robustness against unbounded attackers
5
All Samples
Robust and
Accurate
Accurate
Measure Robust Accuracy
• Formal verification on image classification• Given an image, no adversarial examples exist within a small Lp distance.
• Verified Robust Accuracy (VRA)• The percentage of test samples that are verified to be correctly classified
within a Lp distance bound.• Any Lp norm bounded attacker.
6
Sound Over-approximation
8
o 3 input dimensions, 2 output classeso f(x) is class 0
o For all x’ that Lp(x, x’) ≤ !,is f0(x’) always the largest output?
Sound Over-approximation
9
o 3 input dimensions, 2 output classeso f(x) is class 0
o For all x’ that Lp(x, x’) ≤ !,is f0(x’) always the largest output?
o Sound transformation T: f(x’) ⊆ Tf(x’)Tf overapproximates fRobustness analysis based on Tf(x’)
New Sound Over-approximation Methods
Convex Outer BoundWong et al. ICML 2018, NIPS 2018.
SMT Solvers (Katz et al. 2017; Ehlers 2017)
Mixed-Integer Programming (Bunel et al., 2017; Tjeng et al., 2017; Cheng et al., 2017)
Semidefinite Programming (Raghunathan et al., 2018; Fazlyab et al. 2019)
Convex Relaxation (Weng et al., 2018; Mirman et al., 2018; Gehr et al., 2018; Dvijotham et al., 2018; Wong & Kolter, 2018; Wong et al., 2018)
Abstract InterpretationGehr et al. Oakland 2018, Mirman et al. ICML 2018.
Symbolic Linear RelaxationWang et al. USENIX Security 2018, NIPS 2018.
10
New Sound Over-approximation Methods
Convex Outer BoundWong et al. ICML 2018, NIPS 2018.
Abstract InterpretationGehr et al. Oakland 2018, Mirman et al. ICML 2018.
Symbolic Linear RelaxationWang et al. USENIX Security 2018, NIPS 2018.
11
Abstract transformers for NN layers Solve the dual of the Linear Program Propagate relaxed linear bound
Tradeoff: over-approximation error vs computation overhead
Symbolic Linear Relaxation
Symbolic Linear RelaxationWang et al. USENIX Security 2018, NIPS 2018.
o Propagate Symbolic Intervalso ReLU Relaxation
12
https://github.com/tcwangshiqi-columbia/symbolic_interval
Measure Verified Robust Accuracy (VRA)
• MNIST model, regular training, 98.28% test accuracy• !" ≤ 0.01, 80% VRA• !" ≤ 0.3, 0% VRA
• Verifiably Robust Training Increases VRA
18
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
What is Verifiably Robust Training?
21
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
What is Verifiably Robust Training?
22
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
Adversarial
What is Verifiably Robust Training?
23
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
Adversarial
What is Verifiably Robust Training?
Sound Over-approximation
24
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
Adversarial Verifiable
What is Verifiably Robust Training?
Sound Over-approximation
25
Regular Trainingmin(errors)
Robust Trainingmin( max( errors by successful evasions ) )
Adversarial Verifiable
What is Verifiably Robust Training?
Sound Over-approximation
Underestimate Robust Loss Overestimate Robust Loss
Robust Loss
26
Verifiably Robust Training
28
Training Data
Robust Loss
Sound Overapproximation
!" ≤ #
Training only for robustness decreases accuracy
Verifiably Robust Training
29
Training Data
Robust Loss
Sound Overapproximation
!" ≤ #
Regular Loss
Ongoing research:• Obtain high accuracy and high VRA at the same time• Scale to larger datasets and networks
Verifiably Robust Training
30
Sound Overapproximation!" ≤ #
If we sample enough training data for robustness, VRA generalizes to test data.
Test Data
VRA
Accuracy
MixTrain Results
Our paper “MixTrain” https://arxiv.org/abs/1811.02625• Standard networks used in verifiably robust training research• Acceptable accuracy drop than regular training
31
Dataset Network Accuracy VRA !" BoundMNIST Small 99% 97% 0.1
MNIST Large 96% 58% 0.3
CIFAR10 Resnet 71% 50% 2/255VRA increased from 0
Robustness Properties
Why is it useful for Security?
• Robustness Property for Security Classifier• Reasonably bounded attackers cannot evade the classifier
• PDF malware classifier cannot be evaded by• Inserting benign pages• Deleting non-functional objects
• Twitter spam detector cannot be evaded by• URL with no redirection chain
32
Challenges
• How to train a single model to be robust against different attackers?
• How to maintain low false positive rate?
• Does verifiable robustness generalize to unrestricted attackers?
33
Robust Against Different Attackers
• Obtain VRA for multiple robustness properties and regular accuracy• The underlying optimization problem is harder
• Mixed Training• Combined and dynamic training objective• Mix the batches
34
Accuracy
Property 1 VRA Property 2 VRA
Property 3 VRA
Our paper “MixTrain” https://arxiv.org/abs/1811.02625
New Distance Metric
• To bound attackers that reasonably mimic real attackers• Does not affect false positive rate
• Adversarial malware examples• x -> x’, s.t. f(x’) = benign and O(x’) is malicious, imperceptible by machine
35
New Distance Metric
• To bound attackers that reasonably mimic real attackers• Does not affect false positive rate
• Adversarial malware examples• x -> x’, s.t. f(x’) = benign and O(x’) is malicious, imperceptible by machine• Mimicry operations
• Spam: good words injection.• PDF Malware: merge benign and malicious feature vectors.• Phishing: name squatting.
• High FPR (67% ~ 85%)
36
malicious benign
adversarial malware examples
?
New Distance Metric
• To bound attackers that reasonably mimic real attackers• Does not affect false positive rate
• Lp norm is not suitable.• Image: no single pixel influences the final class too much.• Security classifiers: some features are more malicious or benign.• Therefore, attacker’s operations are not evenly distributed across features.
37
Searching for Evasive PDF Malware
• Attacks can be decomposed to building block operations• Feature insertion-only attacks. Grosse et. al., Hu et al.• Mimicry, merging with benign features. Šrndić et al.• Mutation operations (insert, replace, delete). Xu et al., Dang et al.
• Optimization• Greedy (Gradient Descent)• Genetic Evolution• Hill Climbing
38
Benign
How to choose the distance metric?
• Distance captures the building block attack operations
• A larger distance means more building block attack operations
39
Subtree Distance
• A PDF malware variant needs correct syntax and correct semantic.• PDF file is parsed into a tree structure
40
/Catalog
1
/Pages/Page
Exploit /Javascript
/FlateDecode 1573
/Root/Type
/OpenAction
/JS /S
/Pages
/Kids
/Type
/Count
/Type
/Filter /Length
Subtree Distance
• A PDF malware variant needs correct syntax and correct semantic.• PDF file is parsed into a tree structure• Malware variants differ in subtrees
41
/Catalog
1
/Pages/Page
Exploit /Javascript
/FlateDecode 1573
/Root/Type
/OpenAction
/JS /S
/Pages
/Kids
/Type
/Count
/Type
/Filter /Length
/Catalog
1
/Pages/Page
Exploit /Javascript
/FlateDecode
/Root/Type
/OpenAction
/JS /S
/Pages
/Kids
/Type
/Count
/Type
/Filter
Subtree Distance One: arbitrary changes in 1 out of N subtrees under root
Building Block Robustness Properties
• PDF variants with correct syntax• Subtree insertion property (subtree distance one)• Subtree deletion property (subtree distance one)
• Overapproximate attackers within the bound• Small subtree distance maintains low FPR• Binary path features (Hidost Šrndić et al. NDSS 13)
42
Accuracy 99.74%False positive Rate 0.56%Subtree Insertion VRA 91.86%Subtree Deletion VRA 99.68%
Genetic Evolutionary Attack EvadeML
• Mutation operations (insert, replace, delete). Xu et al., NDSS 2016.
• Cuckoo oracle to check variants are malicious• Classifier’s score as search feedback• Genetic evolution algorithm to optimize the search
43
Benign
Success Rate of Unbounded Attackers
Generate evasive variants for 500 PDFs• Longer mutation trace• Larger feature difference• Worse search feedback
44
Conclusion
• Verifiably robust training has a lot of potential to build truly robust models against different attackers.
• We need to define meaningful distance metrics to train robustness properties for security classifiers.
• Training verifiable robustness in a bounded fashion tend to increase the bar for unrestricted attackers to evade the classifier.
45