Specialization Run-time Code Generation in the context of Dynamic Compilation presented by Mujtaba Ali

Specialization

Run-time Code Generation in the context of Dynamic Compilation

presented by Mujtaba Ali

Earlier in the Semester

“VCODE: A Retargetable, Extensible, Very Fast Dynamic Code Generation System”– Generate “assembly” at run-time– Run-time Code Generation, but not in the realm of

Dynamic Compilation

“Fast, Effective Dynamic Compilation”– Our only foray into Run-time Specialization– Same authors as the first paper we will discuss– Precursor to DyC – we’ll call it the “DyC paper”

Example Compiler Optimizations

Constant Propagation(everyone’s favorite)

Copy Propagation

Constant Folding Loop Unrolling

Branch Elimination Dead Store Elimination(Dead-assignment Elimination)

Order Matters

foo (b) {

a = 3;

print(a);

a = b;

print(a);

}

foo (b) {

a = 3;

print(3);

a = b;

print(a);

}

foo (b) {

a = 3;

print(3);

a = b;

print(b);

}

foo (b) {

print(3);

print(b);

}

Constant Propagation DA EliminationCopy Propagation

Partial Evaluation

Typically: Constant Propagation Constant Folding Loop Unrolling Inlining

Complete Set of Compiler Optimizations

Partial Evaluation

DyC Paper Refresher

Towards Automatic Construction of Staged Compilers

Matthai Philipose, Craig Chambers, and Susan J. Eggers

Contributions

Eases the burden on the compiler writer– Compiler writer simply writes optimizations as usual

Supporting contributions:– Enhancements to Partial Evaluation– Enhancements to Dead-assignment Elimination

Staged Compilation Framework

SCF-ML– Language in which compiler optimizations

are written– First-order side-effect free subset of MLC

ompi

ler

Wri

ter

Com

pile

rU

ser

Stager– Specializes optimizations written in SCF-ML

to separate compile-time and run-time optimizations

– Requires “approximate” information about program input

High-level Overview of SCF

Concrete Example

Function to be optimized:

– Say this function is the only contents of “mul_add.c”

Concrete Example (con’t)

At compile-time, compiler user does the following:

AST ofmul_add.c

Info:a is constat run-time

stagerI1

I2

O1:const prop(SCF-ML)

I3

O2:copy prop(SCF-ML)

O3:dae(SCF-ML)

O1’:stagedconst prop(SCF-ML)

O2’:stagedcopy prop(SCF-ML)

O3’:stageddae(SCF-ML)

stager stager

stub code


In ML:


What does the stub function do?

– All this is happening at run-time

AST ofmul_add.c

Run-timeconst valueof a

P1

P2 P3

O1’:stagedconst prop(SCF-ML)

O2’:stagedcopy prop(SCF-ML)

O3’:stageddae(SCF-ML)

Popt


In ML:

Stager

Stager specializes the optimizations– Partial Evaluator optimizes writer’s optimization and

generates run-time/staged part of optimization– Performance is crucial for run-time part of

optimizations hence Dead-assignment Elimination

It appears as if there are two level of specialization– One of the “alleged” levels is just the automatic

separation between compile-time and run-time

Alternate View of Specialization

M’ is specialized to I Note that O1’ will execute faster than O1

P

P

I

O1 M

O1’Specializing compiler:

Standard compiler:

O1’/2

M/2

M’run it

Compare and Contrast with DyC

Compiler writer’s job is much easier– Stager automatically does all the work– Not specific to C– Not limited to optimizations that come with the

staged compiler

Compiler user still has the same API:– Input (run-time constraints)– User program

Benchmarking

Definitely easier than DyC But is automatic staging as effective wrt:

– Unstaged (standard) compiler– Hand-staged optimizations (as in DyC)

How fast are the combined static-time and run-time stages of the staged compiler?

Also, what about the size of the staged compiler?

Benchmarking (con’t)

vs. Unstaged (standard) compiler– Noticeably faster (it better be)

vs. Hand-staged optimizations– Hand-staged optimizations perform much better– Don’t know why, though

Unstaged compiler vs. staged compiler (both optimized)– Who cares since the unstaged compiler does every thing at

static-time

Size of the automatically staged compiler– In practice, size grows linearly in the size of the input program

Drawbacks

Compiler user must understand specialization– Doesn’t seem to be a way around this

Not clear if code generation during stub function execution is language-specific

Implemented in ML– Language interoperability (ML C)– ML is garbage collected and bounds checked

Not robust enough for public consumption

Supporting Contributions

Partial Evaluators have been developed for many domains– However, partially evaluating optimization programs

presents its own challenges– SCF’s Partial Evaluator:

Knows about map and set types Uses identity tags for better equality analysis

Supporting Contributions(con’t)

Also, SCF employs a smarter representation of abstract values– More amenable to specialization of code

implementing optimizations– Abstract values are used to loosely interpret code

during partial evaluation

Towards Automatic Specialization of Java Programs

Ulrik Pagh Schultz, Julia L. Lawall, Charles Consel, and Gilles Muller

Contributions

Specialization is useful in an OO setting

Supporting contributions:– OO languages lend themselves to specialization – “Quick and dirty” proof of concept– Performance benchmarks

General Idea

The more generic you get, the more your performance suffers

– OO is generic by nature– Specialization is our savior– Specialization will convert a generic program to a

specific one

Genericity Specificity

Sluggish Efficient

Automatic Specialization?

What is meant by “automatic”?– Specialized code is not generated by hand

What is different about the “specialization” here?– Let’s try and mimic run-time specialization at

compile-time– Specialize for each member of the most likely inputs– If actual input at run-time is not in this set, fall back

on the generic, non-specialized code

Concrete Example

Consider this Java class for the power function:


Assume the exponent is always 3– We can specialize the calculate() method

– JSCC would allow us to automatically specialize calculate()


With JSCC, we would define a specialization class as such:

– JSCC will generate the specialized code and…


… JSCC adds a method for switching between specialized implementations:

Specialization in an OO Context

How does OO naturally lend itself to specialization?– Data Encapsulation– Virtual Method Calls– In the case of Java, specializing the “VM”

Specializing Data Encapsulation

Traditional program specialization, but…

– … save on sequence of pointer dereferences (due to OO)

Specializing Object Types

Information fed to specializer can help eliminate dynamic dispatch overhead

Specializing the Virtual Machine

Such low-level optimizations are possible because Java is eventually translated to C in this scheme

Two Levels of Specialization (Again)

Specialization of original Java program– JSCC

Example earlier

Specialization of translated C program– Tempo

Only the compile-time specialization capabilities of Tempo are exploited

“Quick and Dirty” Prototype

Benchmarks

Image Processing Application

Benchmarks (con’t)

What are “Prospective” Benchmarks?

– Benchmarks on manual back-translat ion to Java

Sparc Pentium

Future Work

Run-time specialization– Tempo already supports run-time specialization

Automatic back translation to Java– Why? Portability.

Apply to situations involving extreme genericity– Software components– For example: JavaBeans

Drawbacks

Lack of run-time specialization– Forces tedious compile-time specialization

How will exploiting Tempo’s run-time specialization affect backporting to Java?

Looks and feels like an ugly hack

The Present

JSPEC is the new incarnation– Run-time specialization– Java-to-Java translation

JSPEC paper is unpublished but available

Compare and Constrast with SCF

Unlike SCF, adding additional optimizations to the specialization is non-trivial– You’re stuck with the provided partial evaluator– But partial evaluation covers a lot of ground

No run-time specialization– Most significant drawback

That’s All Folks!

Use Cyclone because it’s good for you!– Cyclone is the bestest language.

Documents

Specialization Run-time Code Generation in the context of Dynamic Compilation presented by Mujtaba Ali