View
8
Download
0
Category
Preview:
Citation preview
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 1
1Copyright © 2019, Elsevier Inc. All rights reserved.
Chapter 1
Fundamentals of Quantitative Design and Analysis
Computer ArchitectureA Quantitative Approach, Sixth Edition
2
Computer Technologyn Performance improvements:
n Improvements in semiconductor technologyn Feature size, clock speed
n Improvements in computer architecturesn Enabled by HLL compilers, UNIX
n Lead to RISC architectures
n Together have enabled:n Lightweight computers
n Productivity-based managed/interpreted programming languages
Copyright © 2019, Elsevier Inc. All rights reserved.
Intro
ductio
n
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 2
3
Single Processor Performance
Copyright © 2019, Elsevier Inc. All rights reserved.
Introduction
4Copyright © 2019, Elsevier Inc. All rights reserved.
Current Trends in Architecturen Cannot continue to leverage Instruction-Level
parallelism (ILP)n Single processor performance improvement ended in
2003
n New models for performance:n Data-level parallelism (DLP)n Thread-level parallelism (TLP)n Request-level parallelism (RLP)
n These require explicit restructuring of the application
Introduction
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 3
5Copyright © 2019, Elsevier Inc. All rights reserved.
Classes of Computersn Personal Mobile Device (PMD)
n e.g. start phones, tablet computersn Emphasis on energy efficiency and real-time
n Desktop Computingn Emphasis on price-performance
n Serversn Emphasis on availability, scalability, throughput
n Clusters / Warehouse Scale Computersn Used for “Software as a Service (SaaS)”n Emphasis on availability and price-performancen Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
n Internet of Things/Embedded Computersn Emphasis: price
Classes of C
omputers
6Copyright © 2019, Elsevier Inc. All rights reserved.
Parallelismn Classes of parallelism in applications:
n Data-Level Parallelism (DLP)n Task-Level Parallelism (TLP)
n Classes of architectural parallelism:n Instruction-Level Parallelism (ILP)n Vector architectures/Graphic Processor Units (GPUs)n Thread-Level Parallelismn Request-Level Parallelism
Classes of C
omputers
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 4
7Copyright © 2019, Elsevier Inc. All rights reserved.
Flynn’s Taxonomyn Single instruction stream, single data stream (SISD)
n Single instruction stream, multiple data streams (SIMD)n Vector architecturesn Multimedia extensionsn Graphics processor units
n Multiple instruction streams, single data stream (MISD)n No commercial implementation
n Multiple instruction streams, multiple data streams (MIMD)n Tightly-coupled MIMDn Loosely-coupled MIMD
Classes of C
omputers
8Copyright © 2019, Elsevier Inc. All rights reserved.
Defining Computer Architecturen “Old” view of computer architecture:
n Instruction Set Architecture (ISA) designn i.e. decisions regarding:
n registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding
n “Real” computer architecture:n Specific requirements of the target machinen Design to maximize performance within constraints:
cost, power, and availabilityn Includes ISA, microarchitecture, hardware
Defining C
omputer Architecture
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 5
9
Instruction Set Architecturen Class of ISA
n General-purpose registersn Register-memory vs load-store
n RISC-V registersn 32 g.p., 32 f.p.
Copyright © 2019, Elsevier Inc. All rights reserved.
Defining C
omputer Architecture
Register Name Use Saverx0 zero constant 0 n/ax1 ra return addr callerx2 sp stack ptr calleex3 gp gbl ptrx4 tp thread ptr
x5-x7 t0-t2 temporaries callerx8 s0/fp saved/
frame ptrcallee
Register Name Use Saverx9 s1 saved callee
x10-x17 a0-a7 arguments callerx18-x27 s2-s11 saved calleex28-x31 t3-t6 temporaries caller
f0-f7 ft0-ft7 FP temps callerf8-f9 fs0-fs1 FP saved callee
f10-f17 fa0-fa7 FP arguments
callee
f18-f27 fs2-fs21 FP saved calleef28-f31 ft8-ft11 FP temps caller
10
Instruction Set Architecturen Memory addressing
n RISC-V: byte addressed, aligned accesses fastern Addressing modes
n RISC-V: Register, immediate, displacement (base+offset)
n Other examples: autoincrement, indexed, PC-relativen Types and size of operands
n RISC-V: 8-bit, 32-bit, 64-bit
Copyright © 2019, Elsevier Inc. All rights reserved.
Defining C
omputer Architecture
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 6
11
Instruction Set Architecturen Operations
n RISC-V: data transfer, arithmetic, logical, control, floating point
n See Fig. 1.5 in textn Control flow instructions
n Use content of registers (RISC-V) vs. status bits (x86, ARMv7, ARMv8)
n Return address in register (RISC-V, ARMv7, ARMv8) vs. on stack (x86)
n Encodingn Fixed (RISC-V, ARMv7/v8 except compact instruction
set) vs. variable length (x86)Copyright © 2019, Elsevier Inc. All rights reserved.
Defining C
omputer Architecture
12Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Technologyn Integrated circuit technology (Moore’s Law)
n Transistor density: 35%/yearn Die size: 10-20%/yearn Integration overall: 40-55%/year
n DRAM capacity: 25-40%/year (slowing)n 8 Gb (2014), 16 Gb (2019), possibly no 32 Gb
n Flash capacity: 50-60%/yearn 8-10X cheaper/bit than DRAM
n Magnetic disk capacity: recently slowed to 5%/yearn Density increases may no longer be possible, maybe increase from 7 to
9 plattersn 8-10X cheaper/bit then Flashn 200-300X cheaper/bit than DRAM
Trends in Technology
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 7
13Copyright © 2019, Elsevier Inc. All rights reserved.
Bandwidth and Latencyn Bandwidth or throughput
n Total work done in a given timen 32,000-40,000X improvement for processorsn 300-1200X improvement for memory and disks
n Latency or response timen Time between start and completion of an eventn 50-90X improvement for processorsn 6-8X improvement for memory and disks
Trends in Technology
14Copyright © 2019, Elsevier Inc. All rights reserved.
Bandwidth and Latency
Log-log plot of bandwidth and latency milestones
Trends in Technology
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 8
15Copyright © 2019, Elsevier Inc. All rights reserved.
Transistors and Wiresn Feature size
n Minimum size of transistor or wire in x or y dimension
n 10 microns in 1971 to .011 microns in 2017n Transistor performance scales linearly
n Wire delay does not improve with feature size!n Integration density scales quadratically
Trends in Technology
16Copyright © 2019, Elsevier Inc. All rights reserved.
Power and Energyn Problem: Get power in, get power out
n Thermal Design Power (TDP)n Characterizes sustained power consumptionn Used as target for power supply and cooling systemn Lower than peak power (1.5X higher), higher than
average power consumption
n Clock rate can be reduced dynamically to limit power consumption
n Energy per task is often a better measurement
Trends in Power and Energy
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 9
17Copyright © 2019, Elsevier Inc. All rights reserved.
Dynamic Energy and Powern Dynamic energy
n Transistor switch from 0 -> 1 or 1 -> 0n ½ x Capacitive load x Voltage2
n Dynamic powern ½ x Capacitive load x Voltage2 x Frequency switched
n Reducing clock rate reduces power, not energy
Trends in Power and Energy
18Copyright © 2019, Elsevier Inc. All rights reserved.
Powern Intel 80386
consumed ~ 2 Wn 3.3 GHz Intel
Core i7 consumes 130 W
n Heat must be dissipated from 1.5 x 1.5 cm chip
n This is the limit of what can be cooled by air
Trends in Power and Energy
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 10
19Copyright © 2019, Elsevier Inc. All rights reserved.
Reducing Powern Techniques for reducing power:
n Do nothing welln Dynamic Voltage-Frequency Scaling
n Low power state for DRAM, disksn Overclocking, turning off cores
Trends in Power and Energy
20Copyright © 2019, Elsevier Inc. All rights reserved.
Static Powern Static power consumption
n 25-50% of total powern Currentstatic x Voltagen Scales with number of transistorsn To reduce: power gating
Trends in Power and Energy
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 11
21Copyright © 2019, Elsevier Inc. All rights reserved.
Trends in Costn Cost driven down by learning curve
n Yield
n DRAM: price closely tracks cost
n Microprocessors: price depends on volumen 10% less for each doubling of volume
Trends in Cost
22Copyright © 2019, Elsevier Inc. All rights reserved.
Integrated Circuit Costn Integrated circuit
n Bose-Einstein formula:
n Defects per unit area = 0.016-0.057 defects per square cm (2010)n N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
Trends in Cost
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 12
23Copyright © 2019, Elsevier Inc. All rights reserved.
Dependabilityn Module reliability
n Mean time to failure (MTTF)n Mean time to repair (MTTR)n Mean time between failures (MTBF) = MTTF + MTTRn Availability = MTTF / MTBF
Dependability
24Copyright © 2019, Elsevier Inc. All rights reserved.
Measuring Performancen Typical performance metrics:
n Response timen Throughput
n Speedup of X relative to Yn Execution timeY / Execution timeX
n Execution timen Wall clock time: includes all system overheadsn CPU time: only computation time
n Benchmarksn Kernels (e.g. matrix multiply)n Toy programs (e.g. sorting)n Synthetic benchmarks (e.g. Dhrystone)n Benchmark suites (e.g. SPEC06fp, TPC-C)
Measuring Perform
ance
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 13
25Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Designn Take Advantage of Parallelism
n e.g. multiple processors, disks, memory banks, pipelining, multiple functional units
n Principle of Localityn Reuse of data and instructions
n Focus on the Common Casen Amdahl’s Law
Principles
26Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Designn The Processor Performance Equation
Principles
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 14
27Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer DesignPrinciples
n Different instruction types having different CPIs
28Copyright © 2019, Elsevier Inc. All rights reserved.
Principles of Computer Design
Principles
n Different instruction types having different CPIs
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 15
29
Fallacies and Pitfalls
n All exponential laws must come to an endn Dennard scaling (constant power density)
n Stopped by threshold voltagen Disk capacity
n 30-100% per year to 5% per year
n Moore’s Lawn Most visible with DRAM capacityn ITRS disbandedn Only four foundries left producing state-of-the-art
logic chipsn 11 nm, 3 nm might be the limit
Copyright © 2019, Elsevier Inc. All rights reserved.
30
Fallacies and Pitfalls
n Microprocessors are a silver bulletn Performance is now a programmer’s burden
n Falling prey to Amdahl’s Lawn A single point of failuren Hardware enhancements that increase
performance also improve energy efficiency, or are at worst energy neutral
n Benchmarks remain valid indefinitelyn Compiler optimizations target benchmarks
Copyright © 2019, Elsevier Inc. All rights reserved.
The University of Adelaide, School of Computer Science 22 November 2018
Chapter 2 — Instructions: Language of the Computer 16
31
Fallacies and Pitfalls
n The rated mean time to failure of disks is 1,200,000 hours or almost 140 years, so disks practically never failn MTTF value from manufacturers assume
regular replacementn Peak performance tracks observed
performancen Fault detection can lower availability
n Not all operations are needed for correct execution
Copyright © 2019, Elsevier Inc. All rights reserved.
Recommended