Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Michel AubizzierreINFILTRATEJan 12th 2012
Seagulls are the security researchers of the sea
Unearthingthe world’s greatest
bugs
Automated testingWhen I say:
<div id=”box” style=”width: 1000px; padding-left: 337px;”></div>
I want:
document.getElementById(“box”).offsetWidth == 1337px;
PASS / FAIL
Merging
master release
security patch
WebKit
• It’s different everywhere:
• Browsers, applications, embedded devices
• Architectures: x86, x64, ARM, MIPS
• Features: SVG, AudioContext, CSS Regions
• Heap allocator
• Performance: set-top-box, phone, laptop
• WebKit had about 300 security bugs in 2011
• Enough data for meaningful machine learning exercise
Also see
• How Open Should Open Source Be?
• http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-98.pdf
• They used libsvm on Firefox version control and:
• committer
• invisible bug
Bug halflife• Mean time between fix published in trunk
and stable browser released:
• Chrome: 22 days
• Safari: 92 Days
• iOS: 107 days
• Blackberry: Unknown, slower than iOS
• iTunes: Similar to iOS and Safari
• webkitgtk: depends on vendor
• Other platforms don’t have a half-life
Fastest fix
• chrome: 1 day (pwn2own), 7 days for regular bugs
• safari: 16 days
• ios: 34 days
Bugs remaining
Exploit TCO
Crash
Get PC
Shellcode
Stage 2./win
Not to scale
Minimized{today we are
optimizing this
Is that browser vulnerable to this?
• Is there a stable browser out there vulnerable to this bug?
• Does Safari have the vulnerable code?
• Is it reachable in Safari?
• Is it exploitable in build 5.1A123 of Safari?
Is that browser vulnerable to this?
• For Chrome and Safari, there is some data available
• Relevant data is not present in version control
• This method will therefore not find it
Artificial Intelligence
SMT solvers on a moonlit beach
What is beauty?
Machine Learning
one
If 1,1,0→1 and 1,0,1→1 and 0,1,1→0
1,0,0→?
Plan
0x00 Tell machine learning software what to do
0x04 Execute magical machine learning
0x08 Check results
0x0c Improve inputs
0x10 jmp 0x04
SVM
• Support Vector Machines
• you can consider it a black box
• expects inputs to be lists of numbers
• gives back numbers
• (almost) no parameters
SVM
• My question: Is commit 12345 security related?
• Expected answer: 1 (or 0)
• Commit must be modeled as list of numbers
Features
• Single attribute of an entity:
• References invisible bug?
• Message contains the word crash?
Enumerations• Committer is [email protected]
• Committer is [email protected]
• Committer is [email protected]
• Split into three attributes, ‘is inferno’, ‘is cevans’, ‘is abarth’
• [1, 0, 0], [0, 1, 0], [0, 0, 1]
• Expressed as sparse matrix:
• Only list attributes which aren’t 0, e.g. 2:1
Training data
• Known correct answers
• Both positive and negative
• Commits 123, 456 are security fixes
• Commits abc, def are not security fixes
Cross validation
• Split training data n-ways
• For every set of (n-1) groups, do they correctly predict the remaining n
Cross validation
• Training data: A B C D E F
• Does A B C D predict E F
• Does A B E F predict C D
• Does C D E F predict A B
Features of security related commits
• Authored by member of the security team
• Reviewed by member of the security team
• Mentions a member of the security team
• Mentions a restricted bug
• The patch contains the word ‘crash’
Features of security related commits 2
• Merged to a branch
• Merged by a member of the security team
• Merge reviewed by a member of the security team
• Message mentions keyword: crash, CVE, out of bounds, use after free, security
Features of boring commits
• Mentions keyword: build, flakiness, rebaseline, unreviewed, rolling out, null
Restricted bugs
• There are about 76.000 bugs
• Curl them all
• Check for /You are not authorized/
• Takes about a day
Going through the repo
• Git
• Master branch available on WebKit git
• Chromium branches through git-svn
• see http://www.dmo.ca/blog/20070608113513/
Going through the repo
• Grit ruby gem by GitHub
• Monkey patched:
def by_security_team? WebKit::SECURITY_TEAM.include?(committer.to_s) end
def reviewed_by_security_team? !!(REVIEW=~message) end
JSONize it { "svn_rev":"95749", "committer":"[email protected]", "by_security_team":false, "reviewed_by_security_team":false, "mentions_security_team":false, "restricted_bug":false, "keywords":["origin","crash","broke"], "crash_in_patch":true, "bug":68570 }
libsvmize it1 1:1 2:0 3:0 4:1 5:0 6:1 7:1 8:0 18:1 71:1 #94857 merged0 1:0 2:0 3:0 4:1 5:1 6:1 7:1 8:0 17:1 71:1 72:1 80:1 #94864 crash merged1 1:0 2:0 3:0 4:1 5:1 6:1 7:1 8:0 17:1 71:1 72:1 79:1 80:1 #94905 crash, build merged-1 1:0 2:0 3:0 4:0 5:1 6:0 7:0 8:0 64:1 80:1 #94955 crash -1 1:0 2:0 3:0 4:0 5:1 6:1 7:0 8:0 17:1 73:1 80:1 #94982 crash merged1 1:0 2:0 3:0 4:1 5:0 6:1 7:0 8:0 65:1 66:1 88:1 #95010 out-of-bounds merged-1 1:0 2:0 3:0 4:0 5:1 6:0 7:0 8:0 17:1 80:1 #95017 crash -1 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 39:1 94:1 #95785 unreviewed -1 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 60:1 #95786 -1 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 42:1 94:1 #95787 unreviewed
???
• despite the name, libsvm is a set of command line tools for me:
./svm-train -c 64 -nu 0.00048828125 inf.t &&
./svm-predict inf inf.model inf.out &&paste inf inf.out |grep -v "\-1$"
Check results0 1:0 2:0 3:0 4:0 5:1 6:0 7:0 8:0 26:1 80:1 86:1 #95673 origin, crash 10 1:0 2:0 3:0 4:1 5:0 6:0 7:0 8:0 20:1 89:1 #95679 policy 10 1:0 2:0 3:0 4:1 5:1 6:1 7:1 8:0 17:1 71:1 #95689 merged 10 1:0 2:0 3:0 4:1 5:1 6:0 7:0 8:0 20:1 85:1 94:1 #95690 unreviewed, null 10 1:0 2:0 3:0 4:1 5:1 6:1 7:1 8:0 17:1 71:1 #95728 merged 10 1:0 2:0 3:0 4:0 5:1 6:0 7:0 8:0 86:1 #95729 origin 10 1:0 2:0 3:0 4:1 5:0 6:0 7:0 8:0 20:1 #95747 10 1:0 2:0 3:0 4:1 5:1 6:1 7:1 8:0 17:1 67:1 71:1 80:1 95:1 #95791 use after free, crash merged 10 1:0 2:0 3:0 4:0 5:1 6:0 7:0 8:0 86:1 92:1 #95845 origin, security 10 1:1 2:0 3:0 4:1 5:0 6:1 7:1 8:0 18:1 71:1 #95857 merged 10 1:0 2:0 3:0 4:1 5:0 6:0 7:0 8:0 17:1 92:1 #95880 security 10 1:0 2:0 3:0 4:0 5:0 6:1 7:1 8:0 71:1 #95924 merged 10 1:0 2:0 3:0 4:0 5:0 6:1 7:1 8:0 71:1 #95959 merged 10 1:0 2:0 3:0 4:0 5:0 6:1 7:0 8:0 67:1 #96020 merged 1
It works
• improving the training set improves results
• commit message of false negatives & false positives give hints for new keyword features
• 80-90% success rate during cross validation
Bugs found through
• fuzzing
• source code review
• insider expertise
Types of bugs
• JIT bugs
• crypto bugs
• policy errors
• memory corruption
Photo thanksCC attribution
• http://www.flickr.com/photos/25949441@N02/6114769105/
• http://www.flickr.com/photos/sis/465574712/
• http://www.flickr.com/photos/jiazi/4790956216/
Thank you,@miaubizhttps://github.com/miaubiz/let-me-see-you-scrape