Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Microprocessor Evolution: Past, Microprocessor Evolution: Past, Present, and FuturePresent, and Future
Guri SohiUniversity of Wisconsin
2
OutlineOutline
l The enabler: semiconductor technologyl Role of the processor architectl Micro-architectures of the past 20 years§ From pipelining to speculation
l Micro-architectures of the next 10 years
3
Semiconductor TechnologySemiconductor Technology
l Many more available transistorsl Imbalances due to disparate rates of
performance improvement§ E.g., logic and memory speeds
How does this impact the architecture of microprocessors?
4
Number of TransistorsNumber of Transistors
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
100,000,000,000
1971 1974 1982 1989 1997 2000 2004 2008 2012 2016
Tra
nsi
sto
rs
4004 8008
8080
80286
80386 80486
Pentium Pentium II
Pentium III Pentium 4
8086
5
Relative Memory SpeedRelative Memory Speed
1.4
2.53.8
6.310.7
2948
75120
1
10
100
1000
1974 1978 1982 1985 1989 1993 1997 1999 2000
Pro
cess
or,
Mem
ory
Div
ide
(Cyc
le T
ime)
6
Intel MicroprocessorsIntel Microprocessors
386 (275 K)486 (1180 K)
Pentium (3100 K)Pentium II (7500 K)
Pentium III (24000 K)Pentium 4 (42000 K)
What is being done with all
the transistors?
2
7
Role of Computer ArchitectRole of Computer Architect
l Get desired level of performance§ Single-thread “desktop” applications
l Determine functionality neededl Determine how functionality should be
implemented
8
Role of Computer ArchitectRole of Computer Architect……
l Defining functionality§ Functionality to deal with increasing latencies
(e.g., caches, wires)§ Functionality to increase parallelism and its
exploitation
l Implementing functionality§ Balancing various technology parameters§ Ease of design / verification / testing
9
The Performance EquationThe Performance Equation
l Not much can be done about first term in hardware§ But, …
l Logic speed increase - decreases 3 rd term§ Watch out for possible increase in 2nd term
l Use micro-architectural innovations to decrease 2nd
and 3rd terms§ Reduce latencies
§ Exploit parallelism
Time = Number of Instructions x Cycles per Instruction x Clock Cycle Time
10
MicroarchitecturalMicroarchitectural FunctionalityFunctionality
l Functionality to cope with increasing memory latencies
l Functionality to exploit parallelism
11
Memory HierarchiesMemory Hierarchies
l Reducing access latency and improving access bandwidth
l Single-level cachesl Multi-level cachesl Non-blocking cachesl Multi-ported and multi-banked cachesl Trace caches
12
The March of ParallelismThe March of Parallelism
Generation 1 (1970s) Generation 2 (1980s)
Generation 3 (1990s)
Generation 4 (2000s)
3
13
Exploiting ParallelismExploiting Parallelism
• Little change in programming model§ still write programs in sequential languages
l Automatic parallelization not widely successful
l Great investment in existing software
Resort to low-level, Instruction Level Parallelism (ILP)
14
Instruction Level Parallelism (ILP)Instruction Level Parallelism (ILP)
l Determine small number (e.g., < 100) instructions to be executed
l Determine dependence relationships and create dependence graph§ Use to determine parallel execution
l Can be done statically (VLIW / EPIC) or dynamically (out-of-order superscalar)
15
Limitations to ILPLimitations to ILP
l Branch instructions inhibit determination of instructions to execute: control dependences
l Imperfect analysis of memory addresses inhibits reordering of memory operations: ambiguous memory dependences
l Program/algorithm data flow inhibits parallelism: true dependences
l Increasing latencies exacerbate impact of dependences
Use speculation to overcome impact of dependences
16
SpeculationSpeculation
Speculation: “.. to assume a business risk in hope of gain’’
Webster
17
Speculation and Computer ArchitectureSpeculation and Computer Architecture
l Speculate outcome of event rather than waiting for outcome to be known§ Program behavior provides rationale for high
success ratel Functionality to support speculationl Functionality to speculate betterl Functionality to minimize mis-speculation
penalty
18
Control SpeculationControl Speculation
l Predict outcome of branch instructionsl Speculatively fetch and execute instructions
from predicted path§ Increase available parallelism
l Recover if prediction is incorrect
4
19
Control Speculation ExampleControl Speculation Example
W r o n g
Speculative
Speculative
SQUASHED
C o n t i n u eF e t c h
CFG Front- end(Fetch)
Speculative
Speculative
Speculative
Speculative
P r e d i c t
P r e d i c t C o r r e c t
Back- end(OoO Execute, Commit)
20
Model for Speculative ExecutionModel for Speculative Execution
Instru
ction
fetch
& bran
ch
predic
tion Dep
ende
nce
checki
ng an
d
dispa
tching
Instruction Issue & Execution
Static program
Dynamic instruction stream
Execution window
Completed instructions
Instn. re
order
&
commit
21
Supporting Control SpeculationSupporting Control Speculation
l Techniques to predict branch outcome: branch predictors§ Initiating speculation§ Improving accuracy of speculation
l Techniques to support speculative execution: reservation stations, register renaming etc.§ Supporting speculative execution
l Techniques to give appearance of sequential execution: reorder buffers, etc.§ Doing it transparently
22
Key observationKey observation
Basic mechanisms to support control speculation can support other forms of
speculation as well
23
PerformancePerformance--Inhibiting ConstraintsInhibiting Constraintsl Control dependences: inhibit creation of instruction
window§ Use control speculation
l Ambiguous data dependences: inhibit parallelism recognition§ Use data dependence speculation
l True data dependences: inhibit parallelism§ Use value speculation
l Common mechanisms may support different forms of speculation
l Different techniques to improve accuracy of speculation
24
Speculation in Use TodaySpeculation in Use Today
l Address calculation and translation (especially if 2-step process)
l Cache hitl Memory ordering violation in multiprocessorsl Load/store dependencesl Many others on the design board
5
25
MultithreadingMultithreading
l Microprocessor can execute multiple operations at a time§ 4 or 6 operations per cycle
l Hard to achieve this level of parallelism from single program
l Can we run multiple programs (threads) on (single) processor without much effort?§ Simultaneous multithreading (SMT) or
Hyperthreading
26
SMT OverviewSMT Overview
Unused Slot Thread 1 Thread 2
Thread 3 Thread 4
Superscalar Fine-grained MT SMT
Exe
cutio
n C
ycle
s
SharedProcessorExecutionResources
AS1 AS2 AS3 AS4
Many “logical” processors (AS – architected state)
One “physical” processor
27
MultithreadingMultithreading
l Today many high-end microprocessors are multithreaded (e.g., Intel Pentium 4)
l Support for 2-4 threads but expect to get only 1.3X improvement in throughput
28
Microprocessors Microprocessors –– the next 10 yearsthe next 10 years
l Factor of 30 increase in semiconductor resources§ How to use it?
l New constraints§ Power consumption§ Wire delays§ Design / verification complexity
l New applications?§ Throughput-oriented workloads
§ Coarse-grain multithreaded applications
29
Technology TrendsTechnology Trends
l Design and verification of large number of transistors becoming unwieldy
l Wires getting relatively slower§ Short wires for fast clock§ Implies increase latencies; exploit locality of
communicationl Power issues becoming very important
30
ArchitectArchitect’’s Role Revisiteds Role Revisited
l Defining functionality§ New models needed to further increase
parallelism exploitationl Implementing functionality§ Becoming a dominating factor?
l Speculation is likely to be the key to overcoming constraints
6
31
Implications of TrendsImplications of Trends
l Implementation considerations will imply computing chips with multiple (replicated?) processing cores§ “multiprocessor” or “multiprocessor-like” or
“multithreaded”§ Will start out as “logical” replication (e.g., SMT)
§ Will move towards “physical” replication (e.g., CMP)
l How to assign work to multiple processing cores?§ Independent programs (or threads)§ Parts of a single program
32
CMP OverviewCMP Overview
l Several processor cores in one die
l Shared L2 caches
l Chip Communication to build multi-chip module with many CMPs + memory
Core 1
L2 Cache
Chip Communication
Core 0
33
ThroughputThroughput--Oriented ProcessingOriented Processing
l Executing multiple, independent programs on underlying parallel micro-architecture§ Similar to traditional throughput-oriented
multiprocessor§ Significant engineering challenges, but little in
ways of architectural / micro-architectural innovation
Can we use underlying “multiprocessor” to speed up execution of single program?
34
Parallel Processing of Single ProgramParallel Processing of Single Program
l Will the promise of explicit / automatic parallelism come true?
l Will new (parallel) programming languages take over the world?
35
Parallel Processing of Single ProgramParallel Processing of Single Program
l Multiple processors in chip will encourage writing of parallel programs
l Reduced (on-chip) latencies may make parallel processing fruitful
36
Speculative ParallelizationSpeculative Parallelization
l Sequential languages aren’t going awayl Use speculation to overcome inhibitors to
“automatic” parallelization§ Ambiguous dependences
l Divide program into “speculatively parallel”portions or “speculative threads”
7
37
Speculative ThreadsSpeculative Threads
l Subject of extensive research today§ Different speculative parallelization models
being investigated
38
Generic circa 2010 MicroprocessorGeneric circa 2010 Microprocessor
l 4 – 8 general-purpose processing engines on chip§ Used to execute independent programs§ Explicitly parallel programs (when possible)§ Speculatively parallel threads§ Helper threads
l Special-purpose processing units (e.g., DSP functionality)
l Elaborate memory hierarchyl Elaborate inter-chip communication facilitiesl Extensive use of different forms of speculation
39
SummarySummary
l Semiconductor technology has, and will continue to, give computer architects new opportunities
l Architects have used speculation techniques to overcome performance barriers; will likely continue to do so
l Future microprocessors are going to have capability to execute multiple threads of code
l New models of speculation (e.g., thread-level speculation) will be needed to extract more parallelism