Number of Transistorssakai/sohi/talk1.pdf · 1 Microprocessor Evolution: Past, Present, and Future Guri Sohi University of Wisconsin 2 Outline lThe enabler: semiconductor technology

1

Microprocessor Evolution: Past, Microprocessor Evolution: Past, Present, and FuturePresent, and Future

Guri SohiUniversity of Wisconsin

2

OutlineOutline

l The enabler: semiconductor technologyl Role of the processor architectl Micro-architectures of the past 20 years§ From pipelining to speculation

l Micro-architectures of the next 10 years

3

Semiconductor TechnologySemiconductor Technology

l Many more available transistorsl Imbalances due to disparate rates of

performance improvement§ E.g., logic and memory speeds

How does this impact the architecture of microprocessors?

4

Number of TransistorsNumber of Transistors

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

100,000,000,000

1971 1974 1982 1989 1997 2000 2004 2008 2012 2016

Tra

nsi

sto

rs

4004 8008

8080

80286

80386 80486

Pentium Pentium II

Pentium III Pentium 4

8086

5

Relative Memory SpeedRelative Memory Speed

1.4

2.53.8

6.310.7

2948

75120

1

10

100

1000

1974 1978 1982 1985 1989 1993 1997 1999 2000

Pro

cess

or,

Mem

ory

Div

ide

(Cyc

le T

ime)

6

Intel MicroprocessorsIntel Microprocessors

386 (275 K)486 (1180 K)

Pentium (3100 K)Pentium II (7500 K)

Pentium III (24000 K)Pentium 4 (42000 K)

What is being done with all

the transistors?

2

7

Role of Computer ArchitectRole of Computer Architect

l Get desired level of performance§ Single-thread “desktop” applications

l Determine functionality neededl Determine how functionality should be

implemented

8

Role of Computer ArchitectRole of Computer Architect……

l Defining functionality§ Functionality to deal with increasing latencies

(e.g., caches, wires)§ Functionality to increase parallelism and its

exploitation

l Implementing functionality§ Balancing various technology parameters§ Ease of design / verification / testing

9

The Performance EquationThe Performance Equation

l Not much can be done about first term in hardware§ But, …

l Logic speed increase - decreases 3 rd term§ Watch out for possible increase in 2nd term

l Use micro-architectural innovations to decrease 2nd

and 3rd terms§ Reduce latencies

§ Exploit parallelism

Time = Number of Instructions x Cycles per Instruction x Clock Cycle Time

10

MicroarchitecturalMicroarchitectural FunctionalityFunctionality

l Functionality to cope with increasing memory latencies

l Functionality to exploit parallelism

11

Memory HierarchiesMemory Hierarchies

l Reducing access latency and improving access bandwidth

l Single-level cachesl Multi-level cachesl Non-blocking cachesl Multi-ported and multi-banked cachesl Trace caches

12

The March of ParallelismThe March of Parallelism

Generation 1 (1970s) Generation 2 (1980s)

Generation 3 (1990s)

Generation 4 (2000s)

3

13

Exploiting ParallelismExploiting Parallelism

• Little change in programming model§ still write programs in sequential languages

l Automatic parallelization not widely successful

l Great investment in existing software

Resort to low-level, Instruction Level Parallelism (ILP)

14

Instruction Level Parallelism (ILP)Instruction Level Parallelism (ILP)

l Determine small number (e.g., < 100) instructions to be executed

l Determine dependence relationships and create dependence graph§ Use to determine parallel execution

l Can be done statically (VLIW / EPIC) or dynamically (out-of-order superscalar)

15

Limitations to ILPLimitations to ILP

l Branch instructions inhibit determination of instructions to execute: control dependences

l Imperfect analysis of memory addresses inhibits reordering of memory operations: ambiguous memory dependences

l Program/algorithm data flow inhibits parallelism: true dependences

l Increasing latencies exacerbate impact of dependences

Use speculation to overcome impact of dependences

16

SpeculationSpeculation

Speculation: “.. to assume a business risk in hope of gain’’

Webster

17

Speculation and Computer ArchitectureSpeculation and Computer Architecture

l Speculate outcome of event rather than waiting for outcome to be known§ Program behavior provides rationale for high

success ratel Functionality to support speculationl Functionality to speculate betterl Functionality to minimize mis-speculation

penalty

18

Control SpeculationControl Speculation

l Predict outcome of branch instructionsl Speculatively fetch and execute instructions

from predicted path§ Increase available parallelism

l Recover if prediction is incorrect

4

19

Control Speculation ExampleControl Speculation Example

W r o n g

Speculative

Speculative

SQUASHED

C o n t i n u eF e t c h

CFG Front- end(Fetch)

Speculative

Speculative

Speculative

Speculative

P r e d i c t

P r e d i c t C o r r e c t

Back- end(OoO Execute, Commit)

20

Model for Speculative ExecutionModel for Speculative Execution

Instru

ction

fetch

& bran

ch

predic

tion Dep

ende

nce

checki

ng an

d

dispa

tching

Instruction Issue & Execution

Static program

Dynamic instruction stream

Execution window

Completed instructions

Instn. re

order

&

commit

21

Supporting Control SpeculationSupporting Control Speculation

l Techniques to predict branch outcome: branch predictors§ Initiating speculation§ Improving accuracy of speculation

l Techniques to support speculative execution: reservation stations, register renaming etc.§ Supporting speculative execution

l Techniques to give appearance of sequential execution: reorder buffers, etc.§ Doing it transparently

22

Key observationKey observation

Basic mechanisms to support control speculation can support other forms of

speculation as well

23

PerformancePerformance--Inhibiting ConstraintsInhibiting Constraintsl Control dependences: inhibit creation of instruction

window§ Use control speculation

l Ambiguous data dependences: inhibit parallelism recognition§ Use data dependence speculation

l True data dependences: inhibit parallelism§ Use value speculation

l Common mechanisms may support different forms of speculation

l Different techniques to improve accuracy of speculation

24

Speculation in Use TodaySpeculation in Use Today

l Address calculation and translation (especially if 2-step process)

l Cache hitl Memory ordering violation in multiprocessorsl Load/store dependencesl Many others on the design board

5

25

MultithreadingMultithreading

l Microprocessor can execute multiple operations at a time§ 4 or 6 operations per cycle

l Hard to achieve this level of parallelism from single program

l Can we run multiple programs (threads) on (single) processor without much effort?§ Simultaneous multithreading (SMT) or

Hyperthreading

26

SMT OverviewSMT Overview

Unused Slot Thread 1 Thread 2

Thread 3 Thread 4

Superscalar Fine-grained MT SMT

Exe

cutio

n C

ycle

s

SharedProcessorExecutionResources

AS1 AS2 AS3 AS4

Many “logical” processors (AS – architected state)

One “physical” processor

27

MultithreadingMultithreading

l Today many high-end microprocessors are multithreaded (e.g., Intel Pentium 4)

l Support for 2-4 threads but expect to get only 1.3X improvement in throughput

28

Microprocessors Microprocessors –– the next 10 yearsthe next 10 years

l Factor of 30 increase in semiconductor resources§ How to use it?

l New constraints§ Power consumption§ Wire delays§ Design / verification complexity

l New applications?§ Throughput-oriented workloads

§ Coarse-grain multithreaded applications

29

Technology TrendsTechnology Trends

l Design and verification of large number of transistors becoming unwieldy

l Wires getting relatively slower§ Short wires for fast clock§ Implies increase latencies; exploit locality of

communicationl Power issues becoming very important

30

ArchitectArchitect’’s Role Revisiteds Role Revisited

l Defining functionality§ New models needed to further increase

parallelism exploitationl Implementing functionality§ Becoming a dominating factor?

l Speculation is likely to be the key to overcoming constraints

6

31

Implications of TrendsImplications of Trends

l Implementation considerations will imply computing chips with multiple (replicated?) processing cores§ “multiprocessor” or “multiprocessor-like” or

“multithreaded”§ Will start out as “logical” replication (e.g., SMT)

§ Will move towards “physical” replication (e.g., CMP)

l How to assign work to multiple processing cores?§ Independent programs (or threads)§ Parts of a single program

32

CMP OverviewCMP Overview

l Several processor cores in one die

l Shared L2 caches

l Chip Communication to build multi-chip module with many CMPs + memory

Core 1

L2 Cache

Chip Communication

Core 0

33

ThroughputThroughput--Oriented ProcessingOriented Processing

l Executing multiple, independent programs on underlying parallel micro-architecture§ Similar to traditional throughput-oriented

multiprocessor§ Significant engineering challenges, but little in

ways of architectural / micro-architectural innovation

Can we use underlying “multiprocessor” to speed up execution of single program?

34

Parallel Processing of Single ProgramParallel Processing of Single Program

l Will the promise of explicit / automatic parallelism come true?

l Will new (parallel) programming languages take over the world?

35

Parallel Processing of Single ProgramParallel Processing of Single Program

l Multiple processors in chip will encourage writing of parallel programs

l Reduced (on-chip) latencies may make parallel processing fruitful

36

Speculative ParallelizationSpeculative Parallelization

l Sequential languages aren’t going awayl Use speculation to overcome inhibitors to

“automatic” parallelization§ Ambiguous dependences

l Divide program into “speculatively parallel”portions or “speculative threads”

7

37

Speculative ThreadsSpeculative Threads

l Subject of extensive research today§ Different speculative parallelization models

being investigated

38

Generic circa 2010 MicroprocessorGeneric circa 2010 Microprocessor

l 4 – 8 general-purpose processing engines on chip§ Used to execute independent programs§ Explicitly parallel programs (when possible)§ Speculatively parallel threads§ Helper threads

l Special-purpose processing units (e.g., DSP functionality)

l Elaborate memory hierarchyl Elaborate inter-chip communication facilitiesl Extensive use of different forms of speculation

39

SummarySummary

l Semiconductor technology has, and will continue to, give computer architects new opportunities

l Architects have used speculation techniques to overcome performance barriers; will likely continue to do so

l Future microprocessors are going to have capability to execute multiple threads of code

l New models of speculation (e.g., thread-level speculation) will be needed to extract more parallelism

Documents

Number of Transistorssakai/sohi/talk1.pdf · 1 Microprocessor Evolution: Past, Present, and Future Guri Sohi University of Wisconsin 2 Outline lThe enabler: semiconductor technology