49
STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03-Nov-15

STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

STOKE Overview 1

STOKE

Alex Aiken

Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill,

JF Bastien (Google)03-Nov-15

Page 2: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Montgomery Multiply from SSH

2

STOKE (11 LOC)

.L0:

shlq 32, rcx

movl edx, edx

xorq rdx, rcx

movq rcx, rax

mulq rsi

addq r8, rdi

adcq 9, rdx

addq rdi, rax

adcq 0, rdx

movq rdx, r8

movq rax, rdi

gcc -O3 (29 LOC).L0:

movq rsi, r9

movl ecx, ecx

shrq 32, rsi

andl 0xffffffff, r9d

movq rcx, rax

movl edx, edx

imulq r9, rax

imulq rdx, r9

imulq rsi, rdx

imulq rsi, rcx

addq rdx, rax

jae .L2

movabsq 0x100000000, rdx

addq rdx, rcx

.L2:

movq rax, rsi

movq rax, rdx

shrq 32, rsi

salq 32, rdx

addq rsi, rcx

addq r9, rdx

adcq 0, rcx

addq r8, rdx

adcq 0, rcx

addq rdi, rdx

adcq 0, rcx

movq rcx, r8

movq rdx, rdi03-Nov-15 STOKE Overview

Page 3: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

The Issue

Compiler optimizers solve a search problem …

… but without doing any search.

303-Nov-15 STOKE Overview

Page 4: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Step Back

What if we started over?

What would a search-based optimizer look like?

403-Nov-15 STOKE Overview

Page 5: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Hmmmmm….

• Irregular, high-dimensional search space• Lots of complicated interactions between different

instructions, machine resources, bits of state.

• And really, really big

• Solution: • Use Markov Chain Monte Carlo sampling

503-Nov-15 STOKE Overview

Page 6: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

MCMC Optimizer

1. Start with some program

2. Repeat (millions of times)

• Make a random change and evaluate cost

• If ( cost decreased )

{ accept (i.e., keep the change)}

• If ( cost increased )

{ with some probability accept anyway }

603-Nov-15 STOKE Overview

Page 7: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Transformations

original

...

movl ecx, ecx

shrq 32, rsi

andl 0xffffffff, r9d

movq rcx, rax

movl edx, edx

imulq r9, rax

...

insert

...

movl ecx, ecx

shrq 32, rsi

andl 0xffffffff, r9d

movq rcx, rax

movl edx, edx

imulq r9, rax

imulq rsi, rdx

... ^

delete

...

movl ecx, ecx

shrq 32, rsi

andl 0xffffffff, r9d

movq rcx, rax

movl edx, edx

imulq r9, rax

...

instruction

...

movl ecx, ecx

shrq 32, rsi

salq 16, rcx

movq rcx, rax

movl edx, edx

imulq r9, rax

...

opcode

...

movl ecx, ecx

shrq 32, rsi

andl 0xffffffff, r9d

movq rcx, rax

subl edx, edx

imulq r9, rax

...

operand

...

movl ecx, ecx

shrq 32, rcx

andl 0xffffffff, r9d

movq rcx, rax

movl edx, edx

imulq r9, rax

...

swap

...

movl ecx, ecx

movl edx, edx

andl 0xffffffff, r9d

movq rcx, rax

shrq 32, rsi

imulq r9, rax

...

703-Nov-15 STOKE Overview

Page 8: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesis

8

cost(r,t) = equal(r,t)

03-Nov-15 STOKE Overview

Page 9: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Optimization

9

cost(r,t) = eq(r,t) + perf(r,t)

03-Nov-15 STOKE Overview

Page 10: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Montgomery Multiply, Revisited

1003-Nov-15 STOKE Overview

Page 11: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

The (Correctness) Cost Function

• Idea 1:

Cost(P) = # of incorrect bits of output on a test suite

• Idea 2:

Reward the right answer in the wrong place.

Consider all ways of adding a suffix of move instructions to the program and choose the one with lowest cost

(+ a small penalty for the moves)

STOKE Overview 1103-Nov-15

Page 12: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Example

Synthesize a program to increment every element of a vector of integers.

%rax holds the base address of the vector

%si holds the # of vector elements

STOKE Overview 1203-Nov-15

Page 13: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 13

Iterations

Cost

(in Millions)

03-Nov-15

Page 14: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 14

Iterations

Cost

(in Millions)

inc:nop

.L_start:nopjne .L_start

nopretq

03-Nov-15

Page 15: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 15

Iterations

Cost

(in Millions)

inc:imull $0x7, %esi, %eaxcmpxchgb %al, %al.L_start:salb $0x5, %silleaq 0x7(%edi,%edi,2),

%rdxjne .L_startretq

03-Nov-15

Page 16: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 16

Iterations

Cost

(in Millions)

inc:movq %rdi, %raxxchgb %dil, %altestb $0x4, %al.L_start:setnp %r10bjne .L_startincb (%rax)retq

03-Nov-15

Page 17: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 17

Iterations

Cost

(in Millions)

inc:.L_start:movq %rdi, %raxcmpw (%rax), %diincw (%rax)decw %sicmovngw %si, %dxaddq $0x4, %rditestb $0xfffffffffffffffe, %siljne .L_startretq

03-Nov-15

Page 18: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Synthesizing Vector Increment

STOKE Overview 18

Iterations

Cost

(in Millions)

inc:.L_start:movq %rdi, %raxincw (%rax)addq $0x4, %rdidecw %sijne .L_startretq

03-Nov-15

Page 19: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

STOKE Overview 1903-Nov-15

Page 20: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Engineering Constraints

• The cost function needs to be inexpensive• Because we will be evaluating it billions of times

• Cost functions are split into two parts• A fast approximate test in the inner loop

• A slow exact test we do rarely

STOKE Overview 2003-Nov-15

Page 21: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Two Cost Functions

Correctness

• Fast: Use test cases

• Slow: Use SMT solver to check for program equivalence

Performance

• Fast: Sum instruction latencies

• Slow: Benchmark code on the target processor

STOKE Overview 2103-Nov-15

Page 22: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Application: Program Optimization• Start with a program T

• Run T on some test cases, save results

• Use STOKE to find a program R that• Matches T on the tests

• Is faster than T

• Check R = T

STOKE Overview 2203-Nov-15

Page 23: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

But …

• How do we check R = T?

• R was discovered by a random process• Unrelated to the structure of T

• There is no given translation from T to R

• A hard program equivalence problem

STOKE Overview 2303-Nov-15

Page 24: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

The Easy Case

• If T and R• Are loop-free• And have no floating point operations• (and some other things)

• Then• Encode T = R as an SMT formula

• Bonus• Can get a counterexample if T is not equal to R• Use as a new test case in another search

STOKE Overview 2403-Nov-15

Page 25: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Handling Loops

25STOKE Overview03-Nov-15

• SMT solvers can prove properties of loop-free code

• We want to prove equivalence of loops

• Decompose proof for loops into subproofs• Subproofs about loop free fragment, query SMT solvers

• Cutpoints: break the loops

• Invariants: relationships between 𝑇 and 𝑅 at cutpoints

Page 26: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Proof Decomposition Example

Target 𝑇movq 8(rsp), rdi#rdi != 0

movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)

retq

movq 8(rsp), r9

#r9 != 0

decq r9 retq

a

a’

b b’

c c’

Rewrite 𝑅𝐼𝑎: states equal

𝐼𝑏: 8(rsp)=rdi=r9’

𝐼𝑏: 8(rsp)=rdi=r9’

𝐼𝑐: live out equal

Simulation Relation26

while(i!=0) i--;

STOKE Overview03-Nov-15

Page 27: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Inference

• Given a simulation relation, proofs for loops reduce to subproofs for loop free fragments• Query SMT solvers

• Main challenge: infer a simulation relation• Infer cutpoints

• Infer invariants

• Mine relations from data • program executions

27STOKE Overview03-Nov-15

Page 28: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Cutpoints From Data

• Attempt to detect cutpoints• Number of times program points are executed

Target 𝑇movq 8(rsp), rdi#rdi != 0

movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)

retq

movq 8(rsp), r9

#r9 != 0

decq r9 retq

Rewrite 𝑅n

1 n

n+1 n+1

n

Data!

28STOKE Overview03-Nov-15

Page 29: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Invariants

• Invariants are restricted to equalities• Infer invariants from observed data values

Target 𝑇movq 8(rsp), rdi#rdi != 0

movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)

retq

Data!

29

8(rsp) rdi

2 2

1 1

0 0

STOKE Overview03-Nov-15

Page 30: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Invariants

• Invariants are restricted to equalities• Infer invariants from observed data values

movq 8(rsp), r9

#r9 != 0

decq r9 retq

Rewrite 𝑅

30

8(rsp) rdi r9’

2 2 2

1 1 1

0 0 0

STOKE Overview03-Nov-15

Page 31: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Linear algebra

𝐼 ≡ (8 𝑟𝑠𝑝 = 𝑟𝑑𝑖) ∧ (𝑟𝑑𝑖 = 𝑟9′))

• Mine all equalities

• Find all 𝑤 s.t. 𝐴𝑤 = 0

• Nullspace

• 𝑤1 = −1,1,0

• 𝑤2 = [0,1, −1]4𝑒𝑎𝑥 = 𝑒𝑑𝑥′ + 310𝑒𝑎𝑥 + 𝑒𝑑𝑥 = 𝑒𝑐𝑥′

𝐴 ≡

31

8(rsp) rdi r9’

2 2 2

1 1 1

0 0 0

STOKE Overview03-Nov-15

Page 32: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Check Simulation Relation

• Query SMT solvers• Incorporate counter-examples in relations

• Sound but not complete• If checking succeeds then equivalent

• Can fail to infer a correct simulation relation

• Learn from counter-examples

32STOKE Overview03-Nov-15

Page 33: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Benchmarks

• Synthesis Kernels: 25 loop-free kernels taken from A Hacker’s Delight (Gulwani el al. ‘11)

• SSL: Montgomery multiplication kernel

• Heap Modifying: Linked list traversal

• Linear Algebra: SAXPY from BLAS

33STOKE Overview03-Nov-15

Page 34: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Results (ASPLOS’ 13, OOPSLA’ 13)

0123456789

10

* * * * * * *

Spee

dup

gcc -O3 icc -O3 STOKE No-Opt

• STOKE is comparable or outperforms gcc and icc while maintaining correctness w.r.t. unoptimized code

34STOKE Overview03-Nov-15

Page 35: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Observation

• Speedups are good: 70% over production compiler

• However, x86 experts produce much better code

• Why can’t we get 2X, 3X speedups over gcc –O3?

• Checker rejects many “good” programs• Works perfectly with the application

• Fails on corner cases that cannot arise

35STOKE Overview03-Nov-15

Page 36: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Example

• Target

foo(int32_t* a,

int32_t* b)

*a++=*b++;

*a++=*b++;

*a++=*b++;

*a++=*b++;

return;

• Expert Rewrite

MOVAPD xmm0, [b]

MOVAPD [a], xmm0

RET

• Incorrect, if a and boverlap

• Segfault if a and b are not 16 byte aligned

36STOKE Overview03-Nov-15

Page 37: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditions

• Cannot use the fast rewrite in an arbitrary context

• Developers know conditions on the context• Used in adding unchecked annotations, writing assembly

• Unconditionally correct optimizations are effective• But fall short of the fastest code desired

• For performance, we need conditional equivalence

37STOKE Overview03-Nov-15

Page 38: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditional Correctness

• Target

foo(int32_t* a,

int32_t* b)

*a++=*b++;

*a++=*b++;

*a++=*b++;

*a++=*b++;

return;

• Rewrite

MOVAPD xmm0, [b]

MOVAPD [a], xmm0

RET

• Conditionally correct• restrict(a), restrict(b)• a, b are 16 byte aligned

38STOKE Overview03-Nov-15

Page 39: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditionally correct optimizer

Conditionally Correct Programs

Condition 𝐶

STOKE EquivalenceChecker

Random Programs

PerfTests

FastConditionallyCorrect Program

39STOKE Overview03-Nov-15

Page 40: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditionally correct optimizer

Condition 𝐶

Data!

40

Conditionally Correct Programs

STOKECHECKER

Random Programs

Tests

FastConditionallyCorrect Program

STOKE Overview

EquivalenceChecker

PerfTests

03-Nov-15

Page 41: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Condition Inference

• Tests encode the conditions on contexts implicitly

• Learn facts about tests• Keep adding facts until the proof succeeds

• Aliasing: two memory dereferences do not alias

• Alignment: an address is x-byte aligned

• Equalities, inequalities, floating point specific

41STOKE Overview03-Nov-15

Page 42: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditions

42

Application(A)

Target (T)

Rewrite(R)

Condition(C)

1. Analyze A to verify C

2. Manually verify C

3. Runtime checkIf (C)

TR

Yes No

STOKE Overview03-Nov-15

Page 43: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditional GCC

Annotations can help a compiler significantly

kme

an

s

ray

sp

he

re

ex

3a

ex

3b

ex

3c

tes

t1

tes

t6

tes

t10

tes

t13

tes

t17

ad

d

sm

ul

do

t

de

lta

dia

g4

dia

g6

dia

g8

mo

ss

0

1

2

3

4

5annot. gcc -O3 gcc -O3

Re

lativ

e P

erf

orm

an

ce

43STOKE Overview03-Nov-15

Page 44: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Conditional STOKE

STOKE with condition inference is comparable or outperforms gcc with or without annotations

kme

an

s

ray

sp

he

re

ex

3a

ex

3b

ex

3c

tes

t1

tes

t6

tes

t10

tes

t13

tes

t17

ad

d

sm

ul

do

t

de

lta

dia

g4

dia

g6

dia

g8

mo

ss

0

1

2

3

4

5 annot. gcc -O3 cSTOKE

gcc -O3

Re

lativ

e P

erf

orm

an

ce

44STOKE Overview03-Nov-15

Page 45: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Runtime Checks

Overhead of condition checking can be bearable

kme

an

s

ray

sp

he

re

ex

3a

ex

3b

ex

3c

tes

t1

tes

t6

tes

t10

tes

t13

tes

t17

ad

d

sm

ul

do

t

de

lta

dia

g4

dia

g6

dia

g8

mo

ss

0

1

2

3

4

5 annot. gcc -O3 cSTOKE

CHECK gcc -O3

Re

lativ

e P

erf

orm

an

ce

45STOKE Overview03-Nov-15

Page 46: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Why Does STOKE Work?

• Larger repertoire• gcc only knows a few instructions in the X86 instruction

set. STOKE knows all 3,000+ instructions.

• More effort• Throw lots of computation at the problem.

• Can prove correctness• Allows experimentation with incorrect code

STOKE Overview 4603-Nov-15

Page 47: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Why Does STOKE Work?

• Optimization is a natural application• Standard compiler transformations are small, local

changes

• Division into search and verification• Search is very general and highly parallel

• Only ask SMT solver to do checking, not inference

• All examples verify in under 1 second

STOKE Overview 4703-Nov-15

Page 48: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Related Work

• Massalin [ASPLOS 87]

• Joshi, Nelson and Randall [PLDI 02]

• Bansal and Aiken [ASPLOS 06]

• Tate, Stepp, Tatlock and Lerner [POPL 09]

• Gulwani. Jha, Tiwari, Venkatesan [PLDI 11]

48STOKE Overview03-Nov-15

Page 49: STOKE Alex Aiken - University of Pennsylvania · STOKE Overview 1 STOKE Alex Aiken Joint work with Eric Schkufza, Rahul Sharma, Berkeley Churchill, JF Bastien (Google) 03 -Nov 15

Questions?

stoke.stanford.edu

03-Nov-15 STOKE Overview 49