42
sa pa ASPLOS 2012 Hadi Esmaeilzadeh Adrian Sampson Luis Ceze Doug Burger Architecture Support for Disciplined Approximate Programming University of Washington Microsoft Research

Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

sa paASPLOS 2012

Hadi EsmaeilzadehAdrian SampsonLuis CezeDoug Burger

Architecture Supportfor DisciplinedApproximate Programming

University of Washington

Microsoft Research

Page 2: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

mobile devicesbattery usage

data centerspower & cooling costs

dark siliconutilization wall

Page 3: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Disciplined approximate programming

Precise Approximate✗✓references

jump targets

JPEG header

pixel data

neuron weights

audio samples

video frames

The EnerJ programming language

safely interleave approximate and precise operation

Page 4: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of
Page 5: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

EnergyErrors EnergyErrors

EnergyErrors EnergyErrors

Page 6: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Perfect correctness is not required

information retrieval

machine learning

sensory data

scientific computing

physical simulation

games

augmented reality

computer vision

Page 7: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;

Disciplined approximate programmingThe EnerJ programming language

Page 8: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;

Disciplined approximate programmingThe EnerJ programming language

approximate data storage

Page 9: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;

Disciplined approximate programmingThe EnerJ programming language

approximate operations

Page 10: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Hardware supportfor disciplined approximate programming

TruffleCoreCompiler

EnerJ Code

@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;

Page 11: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Hardware supportfor disciplined approximate programming

TruffleCoreCompiler

Compiler-directed approximation

Simplify hardware implementation

Safety checks at compile time

No expensive checks at run time

Page 12: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximation-aware ISA

Dual-voltage microarchitecture

Energy savings results

Hardware supportfor disciplined approximate programming

Page 13: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximation-aware ISA

Dual-voltage microarchitecture

Energy savings results

Hardware supportfor disciplined approximate programming

Page 14: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximation-aware languages need:

Approximate operations

Approximate data

Fine-grained interleaving

+-÷×

&|ALU

registers caches main memory

ADD R1 R2 R3MOV R3 R4JMP 0x01234STL R1 0xABCDLDF R2 0xBCDEADD R1 R2 R3MOV R3 R4JMP 0x01234STL R1 0xABCDLDF R2 0xBCDEADD R1 R2 R3MOV R3 R4JMP 0x01234

Page 15: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximation-aware languages need:

Approximate operations

Approximate data

+-÷×

&|ALU

registers caches main memory

per instruction

per cache line

Page 16: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Traditional, precise semantics

ADD r1 r2 r3:

writes the sum of r1 and r2 to r3some value

Page 17: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate semantics

ADD r1 r2 r3:

writes the sum of r1 and r2 to r3some value

Informally: r3 gets something that approximates the sum r1 + r2.Actual error pattern depends on microarchitecture, voltage, process, variation, …

Page 18: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Undefined behavior

ADD r1 r2 r3:

???

Page 19: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate semantics

ADD r1 r2 r3:

writes the sum of r1 and r2 to r3some value

Informally: r3 gets something that approximates the sum r1 + r2.

No other register is modified.

Does not jump to an arbitrary address.No floating point division exception is raised.

No missiles are launched.⋮

Page 20: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

An ISA extensionwith approximate semantics

operationsADD.aMUL.aCMPLE.a

AND.aXNOR.aSRL.a

ADDF.aDIVF.a…ALU

storageregisterscaches

main memory

LDL.aSTL.a STF.a

LDF.a …

Page 21: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Dual-voltage pipeline

Fetch Decode Reg Read Execute Memory Write Back

Branch Predictor

Instruction Cache

ITLB

Decoder Register File

Integer FU

FP FU

Data Cache

DTLB

Register File

data movement & processing planecontrol plane

Page 22: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Dual-voltage pipeline

Register File

Integer FU

FP FU

Data Cache

Page 23: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Dual-voltage pipeline

Register File

Integer FU

FP FU

Data Cache

Integer FU

FP FU

switch replicate switch(dynamic) (dynamic)(static)

Page 24: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Dual-voltage functional units:shadow structures

ExecuteStage

operands result

One structure isactive at a time.

Page 25: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Dual-voltage functional units:shadow structures

Issue width not changed(scheduler is unaware of shadowing)

Inactive unit is power-gated

No voltage change latency

Page 26: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate storage:register modes

r1

r2

r3

r4

r5

r6

r7

r8

r4

precise modeapproximate mode

Reads from registersin approximate modemay return any value.

Page 27: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate storage:register modes

r1

r2

r3

r4

r5

r6

r7

r8

ADD r1 r2 r3

Page 28: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate storage:register modes

r1

r2

r3

r4

r5

r6

r7

r8

ADD.a r1 r2 r3r3

The destination register’smode is set to match thewriting instruction.

Page 29: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate storage:register modes

r1

r2

r3

r4

r5

r6

r7

r8

r3

r4ADD r2 r3.a r4

Register operandsmust be marked withthe register’s mode.(Otherwise, read garbage.)

Page 30: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Registers and caches:dual-voltage SRAMs

001110101101

precisioncolumn

dataVDDH VDDL

row selectiondata (read)+ data (write)

+ precision

DV-SRAM subarray

(for sense amplifiers and

precharge)

Page 31: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Registers and caches:dual-voltage SRAMs

Mixture of precise and approximate data

Instruction stream gives access levels(compiler-specified)

Page 32: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate storage:caches

r1

r2

r3

r4

r5

r6

r7

r8

LDL.a 0x…

r3

r4

r1

Cache

Data enters cache with precision of the access.Compiler: consistently treat data as approximate or precise.(Otherwise, read garbage.)

Page 33: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximate main memory

Detailed DV-SRAM design

Voltage level-shifter and mux circuits

Replicated pipeline registers

Broadcast network details

Also in the paper

0-VddHoutput

VddH VddH

VddL

input

0-VddHprecision 0 -Vdd(H/L)

VddH

VddL

0 -VddLoutput

VddH

0-VddHprecision

VddH

input

0 -Vdd(H/L)VddH

0 -Vdd(H/L)input[0]

0 -Vdd(H/L)input[1]

0 -Vdd(H/L)output

0-VddHselect

Page 34: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Approximation-aware ISA

Dual-voltage microarchitecture

Energy savings results

Hardware supportfor disciplined approximate programming

Page 35: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Energy savings results

Simulated EnerJ programsPrecision-annotated Java [PLDI’11]Scientific kernels, mobile app, game engine, imaging, raytracer

Modified McPAT models for OoO (Alpha 21264) and in-order cores[Li, Ahn, Strong, Brockman, Tullsen, Jouppi; MICRO’09]65 nm process, 1666 MHz, 1.5 V nominal (VDDH)4-wide (OoO) and 2-wide (in-order)Includes overhead of additional muxing, shadow FUs, etc.

Extended CACTI for DV-SRAM structures[Muralimanohar, Balasubramonian, and Jouppi; MICRO’07]64 KB (OoO) and 32 KB (in-order) L1 cacheLine size: 16 bytesIncludes precision column overhead

Page 36: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Energy savings on in-order core

7–24% energy saved on averageRaytracer saves 14–43% energy

-10%

0%

10%

20%

30%

40%

50%

fft imagefill jmeint lu mc raytracer smm sor zxing average

ener

gy re

duct

ion

over

non

-Tru

ffle

0.75 V 0.94 V 1.13 V 1.31 VVDDL =

Page 37: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Energy savings on OoO core

Energy savings up to 17%Efficiency loss up to 5% in the worst case

-10%

0%

10%

20%

30%

40%

50%

fft imagefill jmeint lu mc raytracer smm sor zxing average

ener

gy re

duct

ion

over

non

-Tru

ffle

0.75 V 0.94 V 1.13 V 1.31 VVDDL =

Page 38: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Application accuracy trade-off

fft imagefill jmeint lu mc raytracer smm sor zxing

0%

20%

40%

60%

80%

100%

outp

ut q

uality

-of-s

ervic

e lo

ss

10-8 10-7 10-6 10-5 10-4 10-3 10-2

Application-specific output quality metricsError resilience varies across applications

Page 39: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Hardware support fordisciplined approximate programming

TruffleCoreCompiler

int p = 5;@Approx int a = 7;for (int x = 0..) {

a += func(2);@Approx int z;z = p * 2;p += 4;

}a /= 9;func2(p);a += func(2);@Approx int y;z = p * 22 + z;p += 10;

VDDH

VDDL

Page 40: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Hardware support fordisciplined approximate programming

Approximation-aware ISATightly coupled with language-level precision information

Dual-voltage microarchitectureData plane can run at lower voltageLow-complexity design relying on compiler support

Significant energy savingsUp to 43% vs. a baseline in-order core

Page 41: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of

Future work ondisciplined approximate programming

Approximate accelerators

Precision-aware programmer tools

Non-voltage approximation techniques

Page 42: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of