23
Cherry Picking: Exploiting Process Variations in the Dark Silicon Era Siddharth Garg University of Waterloo Co-authors: Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Cherry Picking: Exploiting Process Variations in the Dark Silicon Era

Siddharth GargUniversity of Waterloo

Co-authors:Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Page 2: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Dark Silicon Challenge

1

0

0.2

0.4

0.6

0.8

1

1.2

0

5

10

15

20

25

30

35

45 32 22 16 11 8

# Tr

ansi

sto

rs

Po

we

r/D

ark

Silic

on

Technology Node

Power

Dark Silicon

# Transistors80% dark silicon at the 8 nm

technology node[Esmaeilzadeh et al., ISCA’11]

Page 3: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Heterogeneous Cores

• How to best utilize dark silicon for performance enhancement?

– Heterogeneity

Dark Silicon Architectures

2

Accelerators

Homogeneous Cores?

Page 4: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Inability to precisely manufacture transistors

– Chip-to-chip variations

– Within-chip variations

Process Variations

3

[Source: Friedberg et al., ISQED’05]

Increasing proportion of within-chip variations

Page 5: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Process Variation Impact

4

1.7X deviation in leakage power

Intel 80-core Teraflop

[Dighe et al., JSSC’11]

30% deviation in frequencyKey idea: exploit heterogeneity that arises from

the impact of process variations

Page 6: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Best 1 of N statistics

– Provision chip with N identical cores and cherry-pick core with highest frequency

Motivation: Best 1 of N

5

Core Frequency

Co

un

t

Best 1 of 2Mean = 1.1Best 1 of 4Mean = 1.2

1-s increase in frequency with core doubling

Page 7: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• 30% reduction in average leakage power

• 2X reduction in worst-case leakage power

Best 1 of N for Leakage

6

Leakage Power Dissipation

Count

Potential yield loss due to thermal runaway

Best 1-of-1

Best 1-of-4

Best 1-of-2

Page 8: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• BubbleWrap [Karpuzcu et al.,MICRO’09]

– Use redundant cores to increase lifetime

– Cores run in Turbo mode till they “pop”

• Dark silicon architectures

– Heterogeneous cores [Esmaeilzadeh et al.,ISCA’11]

– Accelerators [Venkatesh et al.,MICRO’11]

• Statistical Element Selection

– Increasing immunity of analog circuits to process variations [Keskin et al.,CICC’10]

• Process variation aware scheduling

– ILP based solution for multi-programmed apps [Teodorescu et al.,ISCA’08]

Related Work

7

Page 9: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Generate die map of process variations

Variability Modeling

8

Distance

Co

rre

lati

on

Co

eff

icie

nt

(r)

Single Gaussian random variable to model impact of process variations at each location

Spatial correlations modeled using an exponentially decaying

function of distance

Slow

Fast

[Zhiong et al., TCAD’07]

Page 10: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Each core has Ncp identical critical paths– Core frequency limited by slowest

critical path– Critical path delay inversely

proportional to process parameter

Frequency and Leakage

9

Critical Paths (CP)

• Leakage is summed over all Ncore grid points

– Exponential dependence on process parameters

Page 11: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Wide range of power and frequency values• One “technology beating” core

– Likelihood increases with more % dark silicon

Cherry Picking for Single Threads

10

Core Power Dissipation

Co

re F

req

ue

ncy

Pareto Optimal Cores

Technology BeatingCore

Only 4 Pareto optimal cores in the original design without spare cores

Page 12: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Maximize performance within a P Watt budget

– Performance measured as the sum of frequencies of cores that are selected

Cherry Picking: Multi-program Workloads

11

P Watt Bin

Instance of the knapsack problem Pseudo-polynomial time solution

Page 13: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Cherry Picking: Multi-threaded Wkloads

12

• Common execution template for a number of parallel benchmarks

– Sequential phase followed by barrier based synchronization of parallel threads

• Optimal mapping of threads to cores such that:

– Performance is maximized within a power budget

Page 14: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Goal: analytical + accurate performance model that is amenable to optimization

• Execution time limited by sequential thread and slowest parallel thread

– Surprisingly accurate, although grossly simplified

Performance Model

13

Execution time

Amount of sequential work

Frequency of sequential core

Amount of parallel work

Number of parallel threads Slowest parallel

core frequency

Page 15: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• When core 1 frequency is lower than frequency of other cores, lower execution time with increasing frequency

• When core 1 frequency is higher than frequency of other cores, fixed execution time with increasing frequency

Validation

14

Page 16: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Assume that:

– Seq. thread executes on core i

– Slowest parallel thread executes on core j

– Q is a set of M-1 other cores:

• Execution time:

Optimal Mapping

15

Seq.

Par. 1 Par. 2 Par. M

Core i

Core j

Page 17: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• For some <i,j> combinations, there might not exist M-1 faster cores that meet the power budget

– Frequency scaling can be used to meet power constraints at expense of performance

– Frequency of all parallel cores scaled to the same frequency fpar such that:

– Sufficient to only look at M-1 lowest leakage cores

Frequency Scaling

16

Page 18: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• All experimental results based on the Sniper x86 multi-core simulator

– Interval core model, cycle-accurate cache, network and memory models

• Parsec and SPLASH benchmarks with M=16

– Blackscholes

– FFT

– Radix

– Fluidanimate

– Swaptions

Experimental Set-up

17

Page 19: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• 4.7% average error and 7.2% RMS error

Performance Model Validation

18

Simulated Execution Time

Pre

dic

ted

Exe

cuti

on

Tim

e

Under-prediction because increasednetwork latencies are not accounted for

50% Dark Silicon(red)

33% Dark Silicon(green)

Page 20: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Averaged over 10 Monte Carlo experiments for each benchmark and each architecture

Performance Improvements

19

30% 22%

Page 21: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

Insight

20

Page 22: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• Cherry picking proposes to pick the best subset of cores in a homogeneous dark silicon chip

– Power budget is met

– Performance is maximized

– Exploits process variations to create heterogeneity

• Next generation dark silicon architectures might consist of a mix of architectural and process variation driven heterogeneity

– Replica accelerators

Discussion

21

Page 23: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu

• HaDeS: Architectural Synthesis for Heterogeneous Dark Silicon Chip Multi-Processors, DAC’13

– More sophisticated analytical performance models

– Varying degrees of parallelism

– Architectural heterogeneity

Upcoming

22