Figh%ngSo*wareInefficiencyThroughAutomatedBugDetec%on
ShanLuUniversityofChicago
1
Ali=lebitaboutmyself
2
MylifeJ
3
Ilearnedalotfrom…
4
BugDetec%onBackground
5
Figh%ngso*warebugsiscrucial
• So3wareiseverywhere– h5p://en.wikipedia.org/wiki/List_of_so3ware_bugs
• So3warebugsarewidespreadandcostly– Leadto40%systemdownEme[Blueprints2000]– Cost312Billionlostperyear[Cambridge2013]
6
Whatisyourfavoritebug?
• Howmanyofyouhavebeenbotheredbybugs?
7
Howtodetectbugs?
• Study&understandreal-worldbugs• Discoverpa5ernsofcommonbugs
– Sourcecodelevel– Binarycodelevel– …
• Designpa5ern-matchingprogramanalysis– StaEcanalysis– Dynamicanalysis– …
8
Bugdetec%onexamples– MemorybugdetecEon
• Pa5ern:over-boundwrites,…• DetecEon:checkmemoryaccesses&…
– ConcurrencybugdetecEon• Pa5ern:dataraces,atomicityviolaEons,…• DetecEon:checkmemoryaccesses&…
9
P = malloc (10); P[100] = ‘a’;
if (P) *P=‘a’;
P=NULL;
WhyPerformance-BugDetec%on?
10
Howdidthisstart?
• Oneofourbugdetectorsisstrangelyslow
– Whynotprofiling?• Lotsofnoisesinprofiling• Measuringcostnotinefficiency
• AmIabletodothis?• WhereshouldIstart?
11
AnEmpiricalStudyofReal-WorldPerformanceBugs
Arethereperformancebugs?Aretheyimportant?
Whattypesofperformancebugsarethere?
12 Understanding and detecting real-world performance bugs [PLDI '12]
Methodology
13
ApplicaEon
Apache
Chrome
GCC
Mozilla
MySQL
So3wareType
ServerSo3ware
GUIApplicaEon
GUIApplicaEon
Compiler
Command-lineUElity+Server+Library
Language
C/Java
C/C++
C/C++
C++/JS
C/C++/C#
MLOC
1.3
BugDBHistory Tags
Compile-Eme-hog 5.7
4.7
14.0
N/A
N/A
perf
S5
0.45
14y
13y
10y
13y
4y
#Bugs
25
10
10
36
28
Total:109
Findings
• Arethereperformancebugs?– Yes
• Aretheyimportant?– Someare
• Whattypesofperformancebugsarethere?– Whataretheirrootcauses?– Wherearetheytypicallylocated?– Howaretheyusuallyfixed?
14
BugExamples
15
+inti=-k.length();-while(s.indexOf(k)==-1){+while(i++<0||+s.substring(i).indexOf(k)==-1){s.append(nextchar());}
PatchforApache-AntBug34464
for(i=0;i<tabs.length;i++){…tabs[i].doTransact();}+doAggregateTransact(tabs);
MozillaBug490742&Patch
Whatisnext?
Canwedetectperformancebugs?What“pa5ern”didwefind?
16
APatch-BasedInefficiencyDetector
17
Sta%cinefficiencypa=ernsexist
• StaEcallycheckableinefficiencypa5ernsexist
+inti=-k.length();-while(s.indexOf(k)==-1){+while(i++<0||+s.substring(i).indexOf(k)==-1){s.append(nextchar());}
PatchforApache-AntBug34464
for(i=0;i<tabs.length;i++){…tabs[i].doTransact();}+doAggregateTransact(tabs);
MozillaBug490742&Patch
Howtogetthesepa=erns?
• Manuallyextractfrompatches
19
NotContainRules
DynamicRules
LLVMCheckers
PythonCheckers
Detec%onResults
• 17checkersfindPPPsinoriginalbuggyversions• 13checkersfind332PPPsinlatestversions
Foundbycross-applicaKon
checking
Inheritsfrombuggyversions
Introducedlater
*PPP:Poten6alPerformanceProblem
Inefficiencypa=ernbasedbugdetec%onispromising!
Whatisnext?
Dowehavetomanuallyspecifyrules?Canwebuildgenericdetectors?
21
Toddler
AdynamicandgenericdetectortargeEnginefficientnestedloops
22 Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ICSE '13]
Whataregenericinefficiencypa=erns?
23
Previousexample
24
while(s.indexOf(k)==-1){s.append(nextchar());}
Apache-AntBug34464
Password: abcdefg Password: abcdefgh Password: abcdefghi
Whatisthepa=ern?
• Whattypeofnestedloopsarelikelyinefficient?– Manyinnerloopsaresimilarwitheachother
• SomeinstrucEonskeepsreadingsimilarsequencesofvalues
25
abcdefg abcdefgh abcdefghi
Howtodetect?• Input:Testcode+systemundertest• Steps:
1. InstrumentthesystemundertestMonitorloopstarts/ends&memoryreadsinsideloops
2. AnalyzetraceproducedbyinstrumentaEonIdenEfyrepeEEvememory-readsequences
• Output:Loopsthatarelikelyperformancebugs
26
Evalua%onSubjectsandNewBugs
27
Applica%on Descrip%on LOC KnownBugs NewBugs Fixed Confirmed
Ant Buildtool 109,765 1 8 1 0
ApacheCollecEons CollecEonslibrary 51,516 1 20 10 4
Groovy Dynamiclanguage 136,994 1 2 2 0
GoogleCoreLibraries CollecEonslibrary 156,004 2 10 1 2
JFreeChart Chartframework 64,184 1 1 0 0
Jmeter LoadtesEngtool 86,549 1 0 0 0
Lucene Textsearchengine 320,899 2 0 0 0
PDFBox PDFframework 78,578 1 0 0 0
Solr Searchserver 373,138 1 0 0 0
JDKstandardlibrary 2 0 0
JUnittesEngframework 1 1 0
9Apps+2Libs 50,000–320,000 11 44 15 6
Toddlervs.HProf
28
KnownBugBugDetected? FalseP. Rank Slowdown
TODD. PROF TODD. PROF TODD. PROF
Ant ü û 0 19.3 13.7 4.2
ApacheCollecEons ü ü 0 1.0 10.0 2.1
Groovy ü ü 0 3.7 15.5 3.7
GoogleCoreLibraries#1 ü ü 0 1.8 9.0 3.8
GoogleCoreLibraries#2 ü û 0 5.3 7.5 3.2
JFreeChart ü û 0 53.7 13.4 8.8
JMeter ü û 0 10.3 8.5 1.9
Lucene#1 ü û 0 7.7 6.8 2.5
Lucene#2 ü ü 0 3.1 25.4 3.1
PDFBox ü û 1 18.8 51.8 12.1
Solr ü û 0 178.3 114.2 7.1
11 11 4 1 n/a 15.9X 4.0X
Whatisnext?
Whysomanybugsarenotfixedbydevelopers?
29
Caramel
AstaEcandgenericdetectortargeEnginefficientloopswith
simplepatches
30
CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ICSE'15] Won SIGSOFT Distinguished Paper Award
Whatareperf.bugsnotfixed?
31
Correctness Maintainability Manual effort
Potential speedup under certain workload
Can we detect bugs with simple fixes?
Whatisthepa=ern?
• Whatisatypicalsimplefixforaninefficientloop?
• Whattypesofbugshavetheabovetypeoffix?– WethoughtforalongEme…
32
for(…) + if (cond) break;
Bugexample
33
booleanalreadyPresent=false;while(isActualEmbeddedProperty.hasNext()){if(alreadyPresent)break;//CondBreakFIXif(oldVal.getStr().equals(newVal.getStr()))alreadyPresent=true;if(!alreadyPresent)prop.container().addProp(newVal);//sideeffect}
• NewbuginPDFBoxfoundbyus,fixedbydevelopers
• DevelopersfixbugsthathaveCondBreakfixes:– WastecomputaEoninloops– Fixisnon-intrusive
WhatBugsHaveCondBreakFixes?
34
EveryItera%on
LateItera%ons
EarlyItera%ons
No-Result Type1 Type2 TypeY
Useless-Result TypeX Type3 Type4
Where Is Computation Wasted? How Is Computation Wasted?
Ingredient1:ResultInstruc%on
booleanalreadyPresent=false;while(isActualEmbeddedProperty.hasNext()){if(alreadyPresent)break;//CondBreakFIXif(oldVal.getStr().equals(newVal.getStr()))alreadyPresent=true;if(!alreadyPresent)prop.container().addProp(newVal);//sideeffect}
ResultInstrucEon
Ingredient2:Instruc%on-Condi%on
booleanalreadyPresent=false;while(isActualEmbeddedProperty.hasNext()){if(alreadyPresent)break;//CondBreakFIXif(oldVal.getStr().equals(newVal.getStr()))alreadyPresent=true;if(!alreadyPresent)prop.container().addProp(newVal);//ResultIns.}
InstrucEon-CondiEon
36
Ingredient3:Loop-Condi%on
booleanalreadyPresent=false;while(isActualEmbeddedProperty.hasNext()){if(alreadyPresent)break;//CondBreakFIXif(oldVal.getStr().equals(newVal.getStr()))alreadyPresent=true;if(!alreadyPresent)prop.container().addProp(newVal);//ResultIns.}
InstrucEon-CondiEon
AlsoLoop-CondiEon
37
Evalua%onSubjectsandNewBugs
• 15applica%ons– 11Java,4C/C++
• 150newbugs• 116bugsfixed
– 51inJava– 65inC/C++
• Only4rejected• 22bugsinGCCfixed• 149/150fixedautomaEcally
• Only23falseposiEves
Applica%on Descrip%on LOC Bugs
Ant Buildtool 140,674 1
Groovy Dynamiclanguage 161,487 9
JMeter LoadtesEngtool 114,645 4
Log4J Loggingframework 51,936 6
Lucene Textsearchengine 441,649 14
PDFBox PDFframework 108,796 10
Sling Webapp.framework 202,171 6
Solr Searchserver 176,937 2
Struts Webapp.framework 175,026 4
Tika ContentextracEon 50,503 1
Tomcat Webserver 295,223 4
GoogleChrome Webbrowser 13,371,208 22
GCC Compiler 1,445,425 22
Mozilla Webbrowser 5,893,397 27
MySQL Databaseserver 1,774,926 18
Differentaspectsoffigh%ngbugs
39
In-housebugdetecEon
In-fieldfailurerecovery
In-fieldfailurediagnosis
In-housebugfixing
Lowoverhead
Highaccuracy Highaccuracy
Workfrommygroup
40
In-housebugdetecEon
[ASPLOS06];[SOSP07];[ASPLOS09];[ASPLOS10];[ASPLOS11];[OOPSLA13]
[PLDI12];[ICSE13];[ICSE15]
In-fieldfailurerecovery
[ASPLOS13.A][FSE14]
Notyet
In-fieldfailurediagnosis
[OOPSLA10];[ASPLOS13.B];[ASPLOS14];[OOPSLA16*]
[OOPSLA14]
In-housebugfixing
[PLDI11];[OSDI12];[FSE16]
[CAV13]
concurrencybugs
performancebugs
Conclusions&FutureWork
41
Constraints/Requirements
Techniques
Bugs
Thanks!Ques%ons?
42
Mycollaborators• Prof.DarkoMarinov• AdrianNistor• LinhaiSong