Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
&
www.cea.fr
Dynamic CompilationEverywhere
Henri-Pierre Charles
CEA DACLE department / Grenoble
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Introduction : Course Organization
Dynamic Code GenerationState of the art (SOA)
ArchitectureCompiler
Applications impactMetricsIR for code generationFast code generationHow to debug
EvaluationQCM at the end(whithout documents)“Short term memory”Questions topics aregiven : search for
Assumption ; youunderstand my brokenfroglishknow C languageknow compiler structureknow a script language
Dynamic Compilation Everywhere | DACLE Division | | 2
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Intro Why Dynamic Compilation
ProDynamic applicationMemory hierarchy versusdata setUnknown runningarchitectureSpecialized instructionsFast development cycle
ConsStatic CompilationCan verify, assertAs many time as needed
Dynamic Code Generation ! = Dynamic Optimization (or programming)
Dynamic Compilation Everywhere | DACLE Division | | 3
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Arch : ISA
Instruction Set ArchitectureLowest programming Level model : assembly languageAllow to use processor full capacity
Special instructions (multimedia, vectors)Strange memory access
Binary representationRegister access
What is the “instruction execution algorithm” ?Is the assembly language still used ?
https://en.wikipedia.org/wiki/Instruction_set
Dynamic Compilation Everywhere | DACLE Division | | 4
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Quizz Question 2USADA8 : not a “simple instruction”Extract from an ARM databook
file:///home/hpc/Desktop/UseFullDocs/ProcessorISA/ARM/ARM/A9docs/DDI0406B_arm_architecture_reference_manual.pdf
Dynamic Compilation Everywhere | DACLE Division | | 5
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Problem example : video compression
Illustration Video Compress
Dynamic Compilation Everywhere | DACLE Division | | 6
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Problem example : video compression
1 # define PIXEL_SAD_C ( name , lx , ly ) \2 int name( uint8_t *pix1 , int i_stride_pix1 , \3 uint8_t *pix2 , int i_stride_pix2 ) \4 { \5 int i_sum = 0; \6 int x, y; \7 for( y = 0; y < ly; y++ ) \8 { \9 for( x = 0; x < lx; x++ ) \10 { \11 i_sum += abs( pix1[x] - pix2[x] ); \12 } \13 pix1 += i_stride_pix1 ; \14 pix2 += i_stride_pix2 ; \15 } \16 return i_sum ; \17 }
(Extract from x264 video compressor from www.videolan.org project,(lx , ly) ∈ {(4, 4), (4, 8), (8, 4), (8, 8), (8, 16), (16, 8), (16, 16)}
Dynamic Compilation Everywhere | DACLE Division | | 7
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Arch : Language Classification / Addresses
Definition0 address : stack machines ; all operators are implicit (bytecode java)(Java bytecode, postscript)Register machines :
1 address : A add (other operand implicits)2 address : A += B (x86)3 address : A = B + C (itanium)4 address : A = B * C + D (power)
CompromiseCode compacity, expressiveness“Simplicity” / efficiencyCode generation speed
Dynamic Compilation Everywhere | DACLE Division | | 8
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA ARCH : ARM big.LITTLE
big
LITTLE
ARM : “more than 30 billionprocessors sold with more than16M sold every day ARM”(Nov 2013) http://www.arm.com/products/processors/index.php
4 big processors + 4 littleSame ISA, . . .(even for vectoroperation)Low latencie switch
big.LITTLE notion
Dynamic Compilation Everywhere | DACLE Division | | 9
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Architecture : RaspberryBuilding block for geek Web site
RaspberryCharacteristics :
ARM1176JZF-S (ARMv6)700MHzRAM : 256 Mo2 video out ; audio in/outSDCARD mem ; 1 PortUSB 2.0 ;300 mAFull OS : linux, BSD, ...20 $
Illustration
Dynamic Compilation Everywhere | DACLE Division | | 10
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Architecture : Texas Msp430
Building block for embedded system Arch Overview
Texas MicrocontrollerCharacteristics :
8 to 24 MHz512 to 1024 bytes of ramBare metal1.75$ to 2.45$Cross compiler
Illustration
Dynamic Compilation Everywhere | DACLE Division | | 11
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Architecture : GenepyGenepy Bloc diagram
Genepy characteristicsLTE modem processor
Very low power (G Insn / mw)Hard to program (Kb of RAM), specialized operators
Locally heterogeneous (MIPS + DSP Mephisto) / Globally homogeneous
Dynamic Compilation Everywhere | DACLE Division | | 12
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Architecture Nexus4
Face(red) Toshiba THGBM5G6A2JBA1R 8GB Flash(orange) SlimPort ANX7808 SlimPort Transmitter (HDMI outputconverter)(yellow) Invensense MPU-6050 Six-Axis (Gyro + Accelerometer)(green) Qualcomm WTR1605L Seven-Band 4G LTE chip(blue) Avago ACPM-7251 Quad-Band GSM/EDGE and Dual-BandUMTS Powe Amplifier(violet) SS2908001(black) Avago 3012 Ultra Low-Noise GNSS Front-End Module
BackSamsung K3PE0E00A 2GB RAM. We suspect the Snapdragon S4 Pro1.5 GHz CPU lies underneath.Qualcomm MDM9215M 4G GSM/CDMA modemQualcomm PM8921 Power ManagementBroadcom 20793S NFC ControllerAvago A5702, A5704, A5505Qualcomm WCD9310 audio codecQualcomm PM8821 Power Management
http://www.ifixit.com/Teardown/Nexus+4+Teardown/11781
Dynamic Compilation Everywhere | DACLE Division | | 13
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Rappels Nexus4Compil
Application SDK :Java programming languageDalvik virtual machineAndroid Market / Google controled
Native compilationModem stack
Native codeCross compilerClosed source
System stack mostly opensourceCyanogenmod build
Virtual machine CyanogenmodBuilding from source
Dynamic Compilation Everywhere | DACLE Division | | 14
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA C compil chain
DescriptionC language (K&R), C90up to C11Normalizedgcc, clang, icc, tcc, ...
Compilation chainPreprocessing (# stuff)Compilation
Lexical / SyntaxicIR formPhases / passesInstructions selection
AssemblyLoad
https://en.wikipedia.org/wiki/C_(programming_language)
Dynamic Compilation Everywhere | DACLE Division | | 15
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Rappels Static Compilation chain
Static compilation (on C language) :1 Preprocessor (all # stuff : rewriting) cc -E
2 Compilation (from C to textual assembly) cc -S
3 Assembly (from textual asm to binary asm) cc -c
4 Executable (binary + dynamic library)
OptionalProfiling : Compile (use -pg) produce File ; Run File ; Use gprof
cc -da dump all intermediate representation
(Use gcc -v to see all the steps)Don’t stop at static time (Operating system + processor) : Load inmemory, dynamic linking ; Branch resolution ; Cache warmup
Dynamic Compilation Everywhere | DACLE Division | | 16
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA Java
DescriptionJava Language (Sunmicrosystems / Oracle),95NormalizedStack machineEnormous API (+10000classes) “Compile once,run anywhere”Oracle, IBM, Google(+/-)
Compilation chainjavac (.java to .classbytecode)interpretation“Hotspot” compiler
trigger most usedmethodserver or client profile
class to executable
https://en.wikipedia.org/wiki/Java_(programming_language)
Dynamic Compilation Everywhere | DACLE Division | | 17
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SW-SOA Dalvik
DescriptionJava languageAndroid APINormalized ...Google toolboxSecurityStoreDalvi is a register basedmachine
Compilation chainjava to dalvikDalvik VM optimized forsmall memory machinesdalvik jitted
https://source.android.com/devices/tech/dalvik/
Dynamic Compilation Everywhere | DACLE Division | | 18
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA LLVM
DescriptionLow Level VirtualMachinehttp://llvm.org
Many compilation project(compiler toolbox)C ; C++ ; Objective-COther frontend viadragonegg
Compilation chainNative compilation(clang)LLVM bytecodeinterpretation (lli)Ahead of timecompilation ()JIT
Dynamic Compilation Everywhere | DACLE Division | | 19
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SOA JavaScript
DescriptionScript languageTrace based compilationTons of tools (server sideand client side)Used as back-endlanguageNormalized, but withvariants
Compilation chainSource interpretationJIT part of scripts
https://en.wikipedia.org/wiki/JavaScripthttps://en.wikipedia.org/wiki/Comparison_of_server-side_JavaScript_solutionshttps://en.wikipedia.org/wiki/Node.jshttp://bellard.org/jslinux/
Dynamic Compilation Everywhere | DACLE Division | | 20
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SW-SOA JavaScriptVM
VM CategoryWeb Browser
V8 Chrome browserSpider monkey Firefox
Server sideEmbedded
“Smallest javascript”Tiny-JS / STM32C++
Dynamic Compilation Everywhere | DACLE Division | | 21
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SW-SOA Other
Other LanguagesPython, PHP, LUA,FORTH, . . .Fast developmentHuge library
Compilation chainNoneScript compressionJITNative Call Interaction
Dynamic Compilation Everywhere | DACLE Division | | 22
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Arch Memory Cache Hierarchy
Memory type Perm. ? Access time Who care ?Processor Register no ns CompilerCache L1 no ns/ms CompilerCache L2 no ns/us CompilerCache L3 no us/ms Compiler /ApplicationRAM yes/no 10 ms ApplicationFlash / SSD yes .1 s SystemHard drive yes 1-10 s SystemTape Backup yes mn/hr User
Price / Access Time / Who care about
Dynamic Compilation Everywhere | DACLE Division | | 23
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Arch Instruction Execution
ArgumentationWhile true
Fetch instructionIncrement PCDecode instructionFetch / Get DataExecute operationStore result
Vocabulary
RegisterUALPCRISC/CISC
https://en.wikipedia.org/wiki/Instruction_cycle
Dynamic Compilation Everywhere | DACLE Division | | 24
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SW-SOA OOO
Out of Order principle“Intelligent Reordering”Add a dispatch queueCF Slide ARMbig.LITTLE
IllustrationEasy to compileSame ISA, evolution atmicro architectural level(TICK/TOCK Intel)Difficult to compile veryefficient code
https://en.wikipedia.org/wiki/Out-of-order_execution
Dynamic Compilation Everywhere | DACLE Division | | 25
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
SW-SOA MicroController
DescriptionSingle chip with
Compute nodeRAM / ROMI/ODAC ConverterChip radio (IoT)
ProgrammingBare metalCross compilationInteractive (Script)
https://en.wikipedia.org/wiki/Microcontroller
Dynamic Compilation Everywhere | DACLE Division | | 26
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
MPSoC : Multiprocessor System on Chip
One single chip which containsIP blocks (Intellectual property).
FFT / Turbo code / ../..ProcessorsMemory
Why ?IP : Save energySame chip : Save energy, ease integrationMemory on chip : Save energyMultiple processors : parallelism to ... save energy
http://en.wikipedia.org/wiki/System_on_a_chip
Dynamic Compilation Everywhere | DACLE Division | | 27
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Vocabulaire Lois
Moore http://en.wikipedia.org/wiki/Moore’s_law
Amdahl http://en.wikipedia.org/wiki/Amdahl’s_law
Memory bound http://en.wikipedia.org/wiki/IO_bound
CPU bound http://en.wikipedia.org/wiki/CPU_bound
Dynamic Compilation Everywhere | DACLE Division | | 28
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Vocabulaire Speedup
NotionSpeedup S = 100 ∗ Tseq
ToptCan be between
1 and N processor1 and vectorizednon optimized versus optimized version
Mesure Quality what are the execution conditions : data set, computerworkload, reproducibly, ...
Computer science has to use “human science” tools & methodology
Dynamic Compilation Everywhere | DACLE Division | | 29
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Metrics How to Measure Performance ?
What scale ?Application level (TPS,Frame/s, .../)Run to completionMethod levelInstruction level
ToolsWall clockgettimeofday
Performance counters
Full application or functioncall ?
Cold start / warmupStatistic / multiple callsHow many calls
Dynamic Compilation Everywhere | DACLE Division | | 30
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Vocabulaire Peak / Sustained
NotionsPeak performance : maximal theoretical performance, assuming no
bubbleSustain performance : real acheived performance, on a real
benchmark
CostWhat is the percentage you’re ready to lose ? 90% 95%?How many are you ready to pay (time, money) to minimise this loss ?
http://www.top500.org
Dynamic Compilation Everywhere | DACLE Division | | 31
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Vocabulaire : Cycle par seconds / Flops
UnitsMips Million operation per secondFlops Floating point operation per second
http://www.top500.org
Flops/Watt Floating point operation per watt per secondhttp://www.green500.org
IPC : Instructions per Cycle
How to mesureAnalytique (wall clock)Instrumentation
“Portable” (gettimeofday())Hardware (hardware performance counter)
Dynamic Compilation Everywhere | DACLE Division | | 32
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Vocabulaire Amdahl
ArgumentationThe speedup of a programusing multiple processors inparallel computing is limitedby the sequential fraction ofthe program. For example, if95% of the program can beparallelized, the theoreticalmaximum speedup usingparallel computing would be20x as shown in the diagram,no matter how manyprocessors are used.
Illustration
Dynamic Compilation Everywhere | DACLE Division | | 33
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Amdahl2ArgumentationAssume that a task has two independent parts, A and B. B takes roughly25% of the time of the whole computation. By working very hard, onemay be able to make this part 5 times faster, but this only reduces thetime for the whole computation by a little. In contrast, one may need toperform less work to make part A be twice as fast. This will make thecomputation much faster than by optimizing part B, even though B’sspeed-up is greater by ratio, (5x versus 2x)
Illustration
Dynamic Compilation Everywhere | DACLE Division | | 34
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
Intermediate representation (IR)
Represent a programEach compiler hasa/many IRsInstruction ListProgramming model(0/1/2/3 address)Easy to manipulate (SSA)List of instructions
What for ?(re) Generate codeBinary code at staticcompil timeBinary code at run-time
Dynamic Compilation Everywhere | DACLE Division | | 35
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
IR LLVM
LLVM Intermediaterepresentation
SSA form“Easy” to readUsed for OpenCLMany Possible Scenarii
ExamplesCompilte to native clangfile.c -o file
Compile to bytecodeclang -emit-llvm -Sfile.c -o file.bc
Interpret : lli file.bc
Compile ahead of time :llc test.bc
Optimize bytecode :opttest.bc
Dynamic Compilation Everywhere | DACLE Division | | 36
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
IR Java ByteCode
Java BytecodeStack based machineByte InstructionsFast interpretorBytecode Verifier/prover
Examplesjavac File.java“Bytecompile” java programjar tf a.jar File.classLinkjava File.class or a.jarLaunch VM
-server|-zero|-javm|avianchoose VM-XX:CompileThreshold=N1500=client, 10000=server
javah .h for JNI callhttps://en.wikipedia.org/wiki/Java_bytecodehttps://en.wikipedia.org/wiki/Java_bytecode_instruction_listingsDynamic Compilation Everywhere | DACLE Division | | 37
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
IR Java ByteCode
Java VM interpretorRead an instruction (abyte)Choose a case (switch)Use statistics (methodcount)Method call :if call count > thresoldthen Jit compile
if method nativethen call nativeelse call interpretor
HotSpot JITChoose thresold (clientversus server)JIT compile a methodIterative compilationContext switch betweennative and virtualenvironment !
Dynamic Compilation Everywhere | DACLE Division | | 38
Cliquez pour modifier lestyle du titre
© CEA. All rights reserved&
IR ART
Android RuntimeAndroid Virtual Machine(Not JVM)Register basedDalvik was JITed, ART isahead of time (installtime)
Compiler optionseverything - compilesalmost everythingspeed - maximizesruntime performance,(default option)balanced -performance/compilationinvestment.space - prioritizing storagespace.interpret-onlyverify-none - should beused only for trustedsystem code.
https://en.wikipedia.org/wiki/Android_Runtimehttps://source.android.com/devices/tech/dalvik/
Dynamic Compilation Everywhere | DACLE Division | | 39