View
0
Download
0
Category
Preview:
Citation preview
David Šišlák david.sislak@fel.cvut.cz
Effec%veSo*ware
Lecture3:Virtualmachine,byte-code,(de-)compilers,disassembler,profiling
6thMarch2017 ESW–Lecture3 2
Introduc%on–VirtualMachine
» VirtualmachineexecuHonmodel• sourcecode• compiledVMbyte-code• hybridrun-Hmeenvironment(plaJormdependentVMimpl.)
– interpretedbyte-code– compliedassembly-code(naHveCPUcode)– automatedplaJormcapabilityopHmizaHons(e.g.useofSIMD)
» byte-codevs.assembly-code• (+)plaJormindependence(portable)–architecture(RISC/CISC,bits),OS• (+)reflecHon–observe,modifyownstructureatrun-Hme• (+)smallsize• (-)slowerexecuHon–interpretedmode,compilaHonlatencies
6thMarch2017 ESW–Lecture3 3
JAVAVersions
» firstrelease1996bySunMicrosystems(laterOracle)» manydifferentimplementaHons(GNU,IBM,etc.)
» languagechangesandimprovements• 1.4(2002)–assert,NIO• 5.0(2004)–generics,annotaHons,auto-boxing,enum,concurrency
uHls,varargs,foreach,profilingAPI• 6(2005)–basicjavascriptsupport,performanceandGC
improvements(G1,compressedpointers),compilerAPI• 7(2011)–invokedynamic,switchstrings,auto-closeable,GPUpipeline
API• 8(2014)–lambda,streams,improvedjavascriptsupport(baseon
invokedynamic),removedpermgen(metaspace/naHvemem.isused)• 9(2017?)–Ahead-of-TimeCompila6on(non-6eredvs.6eredAOT)
non-6eredAOTprovidepredictableperformance
6thMarch2017 ESW–Lecture3 4
Execu%onTimeComparison
» TheComputerLanguageBenchmarksGame (hBp://benchmarksgame.alioth.debian.org/)
• 10differentalgs.(e.g.DNAmanipulaHon)
6thMarch2017 ESW–Lecture3 5
JAVAVirtualMachine–MemoryLayout
6thMarch2017 ESW–Lecture3 6
JAVAVirtualMachine–MethodArea
» methodareasharedamongallthreads» classdefiniHons
» run-Hmeconstantpool» fieldandmethoddata» byte-codeformethodsandconstructors» iniHalizaHonmethods(<clinit>,<init>)
» naHvemethod» implementaHonofnaHvemethods
6thMarch2017 ESW–Lecture3 7
JAVAVirtualMachine-Frame
» frame» eachthreadhasstackwithframes(outsideofheap,fixedlength)
StackOverflowErrorvs.OutOfMemoryError» frameiscreatedeachHmemethodisinvoked(destroyedaeerreturn)» framesizedeterminedatcompile-Hme(inclassfile)» variables(long,doubleintwo)
» {this}–instancecallonly!» {methodparameters}» {localvariables}
» operandstack(anytype)» LIFO
» referencetorun-%meconstantpool(classdef)» method+classisassociated
6thMarch2017 ESW–Lecture3 8
JAVAVirtualMachine–Stack-orientedBytecode
» stack-oriented-stackmachinemodelforpassingparametersandoutput
(2+3)×11+1
6thMarch2017 ESW–Lecture3 9
JAVAVirtualMachine–Opcodes
» opcode(1byte+variousparameters):» loadandstore(aload_0,istore,aconst_null,…)» arithmeHcandlogic(ladd,fcmpl,…)» typeconversion(i2b,d2i,…)» objectmanipulaHon(new,puJield,geJield,…)» stackmanagement(swap,dup2,…)» controltransfer(ifeq,goto,…)» methodinvocaHon(invokespecial,areturn,…)–framemanipulaHon» excepHonsandmonitorconcurrency(athrow,monitorenter,…)
» prefix/suffix–i,l,s,b,c,f,danda(reference)» variablesasregisters–e.g.istore_1(variable0isthisforinstancemethod)
VS.
6thMarch2017 ESW–Lecture3 10
JAVAVirtualMachine
» usedtoimplementalsootherlanguagesthanJAVA» Erlang->Erjang» JavaScript->Rhino» Python->Jython» Ruby->Jruby» Scala,Clojure–funcHonalprogramming» others
» byte-codeisverifiedbeforeexecuted:» branches(jumps)arealwaystovalidlocaHons–onlywithinmethod» anyinstrucHonoperatesonafixedstacklocaHon(helpsJITfor
registersmapping)» dataisalwaysiniHalizedandreferencesarealwaystype-safe» accesstoprivate,packageiscontrolled
6thMarch2017 ESW–Lecture3 11
JAVAVirtualMachine–Example1–SourceCode
6thMarch2017 ESW–Lecture3 12
JAVAVirtualMachine–ClassFileStructure
6thMarch2017 ESW–Lecture3 13
JAVAVirtualMachine–Example1–ClassFile
6thMarch2017 ESW–Lecture3 14
JAVAVirtualMachine–Example1–DisassembledConstants
» javap–JAVAdisassemblerincludedinJDK
6thMarch2017 ESW–Lecture3 15
JAVAVirtualMachine–Example1–DisassembledFields
6thMarch2017 ESW–Lecture3 16
JAVAVirtualMachine–Example1–DisassembledMethod
» geJield• takes1reffromstack• buildanindexintorunHmepoolofclassinstancebyreferencethis
» areturn• takes1reffromstack• pushontothestackofcallingmethod
6thMarch2017 ESW–Lecture3 17
JAVAVirtualMachine–Example1–DisassembledConstructor
6thMarch2017 ESW–Lecture3 18
JAVAVirtualMachine–Example1–Decompiler
» procyon–open-sourceJAVAdecompiler,supportJAVA8
6thMarch2017 ESW–Lecture3 21
JAVAVirtualMachine–Example2–SwitchSourceCode
6thMarch2017 ESW–Lecture3 22
JAVAVirtualMachine–Example2–SwitchBytecode
6thMarch2017 ESW–Lecture3 23
JAVAVirtualMachine–Example2–SwitchBytecode
6thMarch2017 ESW–Lecture3 24
JAVAVirtualMachine–Example2–CycleBytecode
6thMarch2017 ESW–Lecture3 25
JAVAVirtualMachine–SourceCodeCompila%on
» sourcecodecompila%on(Source->Bytecode)» bytecodeisnotbeqerthanyoursourcecode
» invariantsinlooparenotremoved» noopHmizaHonslike
» loopunrolling» algebraicsimplificaHon» strengthreducHon
» opHonalexternalobfuscatorbytecodeopHmizaHons-ProGuard• shrinker–compactcode,removedeadcode• opHmizer
– modifyaccesspaqern(private,staHc,final)– inlinebytecode
• obfuscator–renaming,layoutchanges• preverifier–ensureclassloading
Test yourself
6thMarch2017 ESW–Lecture3 27
JAVAVirtualMachine–BytecodeCompila%on
» Just-in-%me(JIT)» convertsbytecodeintoassemblycodeinrun-Hme» lookOpenJDKsourcesforverydetailedinformaHon
» adap%veop%miza%on(HeredcompilaHon)» balancetrade-offbetweenJITandinterpreHnginstrucHons» monitorsfrequentlyexecutedparts“hotspots”includingdataon
caller-calleerelaHonshipforvirtualmethodinvocaHon» triggersdynamicre-compilaHonbasedoncurrentexecuHonprofile» inlineexpansiontoremovecontextswitching» opHmizebranches» canmakeriskyassumpHon(e.g.skipcode)->
» unwindtovalidstate» deopHmizepreviouslyJITedcodeevenifcodeisalreadyexecuted
6thMarch2017 ESW–Lecture3 28
JAVAVirtualMachine–JITCompila%on
» Just-in-Hme(JIT)–usuallyasynchronous(3C1,7C2threadsfor32cores)» C1(client)–muchfasterthanC2
» simplifiedinlining,usingCPUregistry» window-basedopHmizaHonoversmallsetofinstrucHons» intrinsicfuncHonswithvectoroperaHons(Math,arraycopy,…)
» C2(server,d64)–high-endfullyopHmizingcompiler» deadcodeeliminaHon,loopunrolling,loopinvarianthoisHng,commonsub-expressioneliminaHon,constantpropagaHon
» fullinlining,fulldeopHmizaHon(backtolevel0)» escapeanalysis,nullcheckeliminaHon,» paqern-basedloopvectorizaHonandsuperwordpacking(SIMD)
» %eredcompila%on–hybridadapHng(sinceJVM7,defaultinJVM8)» on-stackreplacement(OSR)–opHmizaHonduringexecuHon
» startatbytecodejumptargets(goto,if_)
6thMarch2017 ESW–Lecture3 29
AssemblyCode
» reasonstostudyassemblycode(bothJavaandC/C++)• educaHonalreasons
– predictefficientcodingtechniques• debuggingandverificaHon
– howwellthecodelookslike• opHmizecode
– forspeed• avoidpoorlycompiledpaqerns• datafitsintocache• predictablebranchesornobranches• usevectorprogramingifpossible(SIMD)
» 256bitregisterswithAVX2sinceIntelSandyBridge» 512bitAVX-512sinceIntelKnightLanding(XeonPhi)
– forsize• primarilycodecacheefficiency
6thMarch2017 ESW–Lecture3 30
JAVAVirtualMachine–Example2–TieredCompila%on
» -XX:+PrintCompilaHon(-XX:+PrintInlining)
6thMarch2017 ESW–Lecture3 31
JVM–Example2–daysInMonthAssemblyCode–Tier3
» -XX:+UnlockDiagnosHcVMOpHons-XX:+PrintAssembly» allexamplesareinJVM864-bitServer,IntelHaswellCPU,AT&Tsyntax%er3-C1withinvoca%on&backedgecounters+MethodDataOopcounter
because:count="256"iicount="256”hot_count="256”stackiniHalizaHon,invoca%oncounterinMDO(0xDC)+triggerC2
0x1ff8 >> 3 = 1024 invocations trigger tier 4 (C2)
month, year stacking banging technique, stack allocation, saving registers
6thMarch2017 ESW–Lecture3 32
JVM–Example2–daysInMonthAssemblyCode–Tier3
ESI is month input
default jump
6thMarch2017 ESW–Lecture3 33
JVM–Example2–daysInMonthAssemblyCode–Tier3
targetformonth=4,backedgecountertrackinginMDO(0x290):jumptarget,inlinedTLABalloca%onofIntegerobject:
no space in TLAB -> new TLAB + external allocation with header init returns after the inlined allocation
EBX=30 is retVal
RAX Integer instance address Object structure (64-bit JVM): - header 12 or 16 Bytes - object data super class first, type grouped
8B - mark word
4B / 8B – Klass ref.
… object data
Array object structure (64-bit JVM): - header 16 or 20 Bytes - sequence of array values
8B - mark word
4B / 8B – Klass ref.
sequence of values
4B – array length
0x10 Integer instance size
object initialization, header filed with prototype mark
6thMarch2017 ESW–Lecture3 34
JVM–Example2–daysInMonthAssemblyCode–Tier3
inlinedIntegerconstructorwithsupers,invocaHoncountsinMDOs(0xDC) Integer::<init>,Number::<init>,Object::<init>
-currentlyinHer3(C1countersinMDO) invocation cnt of Integer::<init> in daysInMonth for inline
invocation cnt in Integer::<init> + trigger its C2
invocation cnt of Number::<init> in Int::<init> for inline
invocation cnt in Number::<init> + trigger its C2
invocation cnt of Object::<init> in Numb::<init> for inline
invocation cnt in Object::<init> + trigger its C2
RAX.value = EBX (retVal)
6thMarch2017 ESW–Lecture3 35
JVM–Example2–daysInMonthAssemblyCode–Tier3
finalcleanupandreturn,RAXcontainsreturnvalue(pointertoIntegerinstance)» safepoint–OopsinperfectlydescribedstatebyOopMap(GCmaps)
• threadsaresuspended,Oopsafelymanipulatedexternallyandresumedaeer• ininterpretedmode–betweenany2bytecodes• inC1/C2compiled–endofallmethods(notin-lined),non-countedloopbackedge,
duringJVMrun-Hmecall• parked,blockedonIO,monitororlock• whilerunningJNI(donotneedthreadsuspension)• globalsafepoint(allthreads)–stoptheworld
– GC,printthreads,threaddumps,heapdump,getallstacktrace– enableBiasedLocking,RevokeBias– classredefiniHon(e.g.instrumentaHon),debug
• localsafepoint(justexecu%ngthread)– de-opHmizaHon,enable/revokebiaslocking,OSR
stack dealocation, reload register safepoint poll check
6thMarch2017 ESW–Lecture3 36
JVM–TimeToSafePoint(TTSP)
-XX:+PrintSafepointStaHsHcs-XX:+PrintGCApplicaHonStoppedTime-XX:PrintSafepointStaHsHcsCount=1
GetStackTraceoverheads:
6thMarch2017 ESW–Lecture3 37
JVM–Example2–daysInMonthAssemblyCode–Tier4
%er4–C2–noprofilecountersbecause:count="5376"iicount="5376”hot_count="5376”
stackiniHalizaHon,uselookuptablejumpfortableswitch
default (>=12)
month, year
6thMarch2017 ESW–Lecture3 38
JVM–Example2–daysInMonthAssemblyCode–Tier4
targetformonth=4Integer.<init>,Number.<init>,Object.<init>-iicount=“5376”->Inline(hot)op%mizedbranching,inlinedTLABalloca%on,inlinedconstructors,nonulling,cachingop%miza%on
EBP=30 is retVal
TLAB Integer object allocation, ref in RAX
MarkWord fetch from class and then store compressed OOP to Integer class
RAX.value = EBX (retVal)
final cleanup
RAX contains return value (pointer to Integer instance)
6thMarch2017 ESW–Lecture3 39
JVM–Example2–daysInMonthAssemblyCode–Tier4
targetfordefaultclassIllegalArgumentExcepHonnoprofile->uncommon->reinterpretremapinputs,returnbacktoreinterpreterthendiscardHer3version
6thMarch2017 ESW–Lecture3 40
JVM–Example2–computeAssemblyCode–Tier4OSR
OSR@10–OnStackReplacementatbytecode10%er4–C2(beforetherewasHer3OSR@10because60416loopsandHer3)
because:backedge_count=”101376"hot_count=”101376”copy4localsonstackfromHer3OSR@10toregs
RSI compiled stack of tier 3 OSR @10
6thMarch2017 ESW–Lecture3 41
JVM–Example2–computeAssemblyCode–Tier4OSR
loopcriteriathenthereisinlinedHer4daysOfMonth(lookupjump)becausethecallishotendingwithaddiHonintoaccumulatororeinterpretonendofcyclejump(unstableif_bytecode),save3localstostack
EBX is local I; 0xF4240 = 1_000_000
6thMarch2017 ESW–Lecture3 42
JVM–Example2–computeAssemblyCode–Tier4
%er4–C2because:count=”2”backedge_count=”150528”
usecombinaHonoffullinline,deadcodeelimina%on,objectescape,loopinvarianthois%ng,strengthreduc%on
30_000_000
RAX contains return value (primitive int)
6thMarch2017 ESW–Lecture3 44
JavaVirtualMachine–Performance32vs64-bit
» requireswarm-uptou%lizebenefitsofC2(orC1)» compilerscannotdoallmagic->writebekeralgorithms
» 32-bitvs64bitsJVMs• 32-bit(max~3GBheap)
– smallermemoryfootprint– slowerlong&doubleoperaHons
• 64-bitmax32GBvirtualmemory(withdefaultObjectAlignmentInBytes)- fasterperformanceforlong&double– slightincreaseofmemoryfootprint– compressedOOPsareslightlyslowerforreferencesuponusage– compressedOOPslessmemory->lessfrequentGC->fasterprogram
• 64-bit>32GBvirtualmemory(largeheap)– fastreferenceusage– wasHngalotofmemory(48GB~32GBwithcompressedOOPs)
6thMarch2017 ESW–Lecture3 45
JavaVirtualMachine–CPUandMemoryProfiling
» jvisualvm• JVMmonitoring,troubleshooHngandprofilingtool• includedinallJDKs• profiledthreadlimit32
» profiling• CPU–Hmespentinmethods• memory–usage,allocaHons
» modes• sampling
– periodicsamplingofstacksofrunningthreadstoesHmateslowest– noinvocaHoncounts,no100%accuracy(varioussamplingerrors)– nobytecode(&assemblycode)modificaHons– 1-2%impacttostandardperformance
• tracing(instrumetaHon)– instrumentedbytecode->affectedperformance->affectedcompilerop%miza%ons
6thMarch2017 ESW–Lecture3 46
JVM–Example2–CPUTracingofdaysOfMonth
assemblycodeof%er4–C2(beforetherewasverycomplexHer3)inlineddaysInMonthrootMethodEntrytracking
749 Bytes of assembly code for each rootMethodEntry
6thMarch2017 ESW–Lecture3 47
JVM–Example2–CPUTracingofdaysOfMonth
addiHonalrootMethodEntryandrootMethodExittrackingsforInteger::<init>andNumber::<init>
inlinedrootMethodExitaeerIntegerinstance.value=retVal
313 Bytes of assembly code for each rootMethodEntry
6thMarch2017 ESW–Lecture3 48
JVM–Example2–CPUTracingOutcome
6thMarch2017 ESW–Lecture3 49
JVM–Example2–ProfilingPerformance
» CPUtracingofcomputeresultsintomuchslowercode• noobjectescapefromdaysInMonthcall• noinvarianthoisHng• nostrengthreducHon(fullloopremainsthere)
» objectallocaHonissimilarwithtraceObjAllocinjectedcalls
» recommendedapproach• dosamplingfirst• idenHfyperformanceboqlenecks(wheremostHmeisspent)
– itcouldbeoutsideofJVM(e.g.latencyofexternalDB,filesystem)• focuswithtracingjusttoidenHfiedparts
6thMarch2017 ESW–Lecture3 50
JVM–JavaMissionControl
jmc–JRockitJVM,includedincommercialJDKs,samplinginFlightrecorder
6thMarch2017 ESW–Lecture3 51
ApproachtoPerformanceTes%ng
» testrealapplica%on–ideallythewayitisused• microbenchmarks–measureverysmallunits
– warm-up–tomeasurerealcode,notcompilersitself,biasedlocks• keepinmindcaching
– bewareofcompilers–useresults,reorderingofoperaHons– synchronizaHon–mulH-threadedbenchmarks– varypre-calculatedrightparametersaffecHngcomplexity–differentopHmizaHoninreality
• macrobenchmarks–measureapplicaHoninput/output– leastperformingcomponentaffectsthewholeapplicaHon
• mesobenchmarks–isolaHngperformanceatmodularlevel» understandthroughput,elapsedandresponse%me
• outlierscanoccur–e.g.GC• useexisHnggeneratorsthanwriHngown
6thMarch2017 ESW–Lecture3 52
ApproachtoPerformanceTes%ng
» understandvariability–changesoverHme• internalstate• backgroundeffects–load,network• probabilisHcanalysis–workswithuncertainty
» testearly,testo*en–ideallypartofdevelopmentcycle• ideallysomeproperlyrepeatedmesobenchmarking• automatetests–scripted• propertestcoverageoffuncHonalityandinputs• testontargetsystem–differentcodeondifferentsystems
Recommended