Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
ZhenLi1,DeqingZou1,ShouhuaiXu2,XinyuOu1,HaiJin1,SujuanWang1,ZhijunDeng1,YuyiZhong11HuazhongUniversityofScienceandTechnology(HUST),Wuhan,China2UniversityofTexasatSanAntonio(UTSA),SanAntonio,USA
Automatic Software Vulnerability Detection
² Automaticdetectionofsoftwarevulnerabilitiesisanimportantresearchproblem
² Staticvulnerabilitydetectiontoolsandstudies
2
RATS
VUDDY (SP’17)
ReDeBug … VulDeePecker (ACSAC’16)
Drawbacks of Existing Approaches
² First,imposingintenselaborofhumanexperts
ü Definefeatures
² Second,incurringhighfalsenegativerates
ü Twomostrecentvulnerabilitydetectionsystems
• VUDDY(SP’17):falsenegativerate=18.2%forApacheHTTPD2.4.23
• VulPecker(ACSAC’16):falsenegativerate=38%withrespectto455vulnerabilitysamples
3
Research Problem
² Giventhesourcecodeofatargetprogram,howcanwedeterminewhetherornotthetargetprogramisvulnerableandifso,wherearethevulnerabilities?
4
Withoutaskinghumanexpertstomanuallydefinefeatures
Withoutincurringahighfalsenegativerateorfalsepositiverate
Our Main Contribution
VulnerabilityDeepPecker(VulDeePecker):
Adeeplearning-basedsystemforautomatically
detectingvulnerabilitiesinprograms(sourcecode)
5
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
6
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
7
Guiding Principles: three questions
8
Q1:Howtorepresentsoftwareprogramsfordeeplearning-basedvulnerabilitydetection?
Q2:Whatistheappropriategranularityfordeeplearning-basedvulnerabilitydetection?
Q3:Howtoselectaspecificneuralnetworkforvulnerabilitydetection?
Guiding Principles
9
Q1:Howtorepresentsoftwareprogramsfordeeplearning-basedvulnerabilitydetection?
Preserve the semantic relationships between the programs’elements(e.g.,data-flowandcontrol-flowinformation).
Guiding Principles
10
Q2:Whatistheappropriategranularityfordeeplearning-basedvulnerabilitydetection?
Represented at a finer granularity than treating a program or afunctionasaunit.
Guiding Principles
11
Q3:Howtoselectaspecificneuralnetworkforvulnerabilitydetection?
Neural networks that can copewith contextsmaybe suitable forvulnerabilitydetection.
CNN
DBN
DNN
…
Traditional RNN
LSTM
GRU …
RNN Unidirectional LSTM
Bidirectional LSTM
LSTM
RNN Thispaper
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
12
Overview of VulDeePecker
13
14
The Concept of Code Gadget
² Aunitforvulnerabilitydetection
² Anumberofprogramstatementsthataresemanticallyrelatedtoeachotherintermsofdatadependencyorcontroldependency
² Example:vulnerabilitiesrelatedtolibrary/APIfunctioncalls
Step I: Generating Code Gadgets
15
Acodegadgetcorrespondingto
strcpy()
² Eachcodegadgetislabeledas“1”(i.e.,vulnerable)or“0”(i.e.,notvulnerable).
16
According to the diff files
According to the vulnerable statements
Step II: Generating Ground Truth Labels
Step III: Transforming Code Gadgets into Vectors
² Transformcodegadgetsintotheirsymbolicrepresentations² Encodethesymbolicrepresentationsintovectors
17
7 tokens
Step IV: Training the BLSTM Neural Network
² TrainingprocessforlearningtheBLSTMneuralnetworkisstandard
18
Steps V-VII: Detection Phase
19
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
20
Research Questions
21
RQ1:CanVulDeePeckerdealwithmultipletypesofvulnerabilitiesatthesametime?
RQ2:Canhumanintelligence(otherthandefiningfeatures)improvetheeffectivenessofVulDeePecker?
RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?
² Metricsforevaluationü Falsepositiverate(FPR),falsenegativerate(FNR),recall,precision,F-measure
Preparing Input to VulDeePecker
² ProgramscollectionforansweringtheRQsü Twosourcesofvulnerabilitydata
• 19C/C++opensourceproductswhichvulnerabilitiesaredescribedinNVD,andC/C++testcasesinSARD
ü Collect520opensourcesoftwareprogramfilesand8,122testcasesforthebuffererrorvulnerability(i.e.,CWE-119),and320opensourcesoftwareprogramfilesand1,729testcasesfortheresourcemanagementerrorvulnerability(i.e.,CWE-399)
² Trainingprogramsvs.targetprograms ü Randomlychoose80%oftheprogramswecollectastrainingprogramsandtherest20%astargetprograms
22
Learning BLSTM Neural Networks
² DatasetsforansweringtheRQsü CodeGadgetDatabase(CGD):61,638codegadgetsü SixdatasetsofCGD
23
BE:BuffererrorvulnerabilitiesRM:ResourcemanagementvulnerabilitiesHY:Hybridoftheabovetwotypesof
vulnerabilities
ALL:Alllibrary/APIfunctioncallsSEL:Manuallyselectedlibrary/
APIfunctioncalls
RQ1
² Insight:VulDeePeckercandetectmultipletypesofvulnerabilities,buttheeffectivenessissensitivetotheamountofdata(whichiscommontodeeplearning).
24
RM:16functioncallsrelatedtovulnerabilitiesBE:124functioncallsrelatedtovulnerabilities
RQ1:CanVulDeePeckerdealwithmultipletypesofvulnerabilitiesatthesametime?
² Insight:HumanexpertisecanbeusedtoselectfunctioncallstoimprovetheeffectivenessofVulDeePecker.
25
RQ2:Canhumanintelligence(otherthandefiningfeatures)improvetheeffectivenessofVulDeePecker?
RQ2
² Insight:Adeeplearning-basedvulnerabilitydetectionsystemcanbemoreeffectivebytakingadvantageofthedata-flowinformation.
26
RQ3: VulDeePecker vs. Static Analysis Tools
RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?
² Insight:VulDeePeckerismoreeffectivethancodesimilarity-basedapproaches
27
RQ3: VulDeePecker vs. Code Similarity-Based Approaches
RQ3:HoweffectiveisVulDeePeckerwhencomparedwithotherapproaches?
² VulDeePeckerdetected4vulnerabilities,whichwerenotreportedintheNVD,butwere“silently”patchedbythevendors.
² Thesevulnerabilitiesaremissedbymostoftheothervulnerabilitydetectionsystemsmentionedabove
28
Using VulDeePecker in Practice
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
29
Limitations and Open Problems
² Presentdesignü Assumingsourcecodeisavailableü OnlydealingwithC/C++programsü Onlydealingwithvulnerabilitiesrelatedtolibrary/APIfunctioncallsü Onlyaccommodatingdata-flowinformation,butnotcontrol-flowinformationü Usingsomeheuristics
² Presentimplementationü LimittotheBLSTMneuralnetwork
² Presentevaluationü Thedatasetonlycontainsvulnerabilitiesaboutbuffererrorsandresourcemanagementerrors
30
Outline
² GuidingPrinciples² DesignofVulDeePecker² ExperimentsandResults
² Limitations
² Conclusion
31
Conclusion
² Weinitiatethestudyofusingdeeplearningforvulnerabilitydetection,anddiscusssomepreliminaryguidingprinciples
² WepresentVulDeePecker,andevaluateitfrom3perspectives
² Wepresentthefirstdatasetforevaluatingdeeplearning-basedvulnerabilitydetectionsystems² https://github.com/CGCL-codes/VulDeePecker
32
New Results (after finishing the paper; in submission)
² Copewithallkindsofvulnerabilities(includinglibrary/APIfunctioncallsrelatedones)
² Accommodatebothdatadependencyandcontroldependency
² Detect7(potential)0-dayvulnerabilitiesand8silentlypatchedvulnerabilitiesfrom4softwareproducts
² Somedeepneuralnetworksaremorepowerfulthanothers 33
Takeaways
² Thefirstdeeplearning-basedvulnerabilitydetectionsystemusingafiner-granularityunitcodegadget
² Guidingprinciplesfordeeplearning-basedvulnerabilitydetection
² Thefirstdatasetforevaluatingdeeplearning-basedvulnerabilitydetectionsystems
34