Upload
regina-webster
View
216
Download
3
Embed Size (px)
EE472 – Spring 2007 P. Chiang, with Slide Help from C. Kozyrakis (Stanford)
Lecture 1 - 1
Department of Electrical EngineeringOregon State University
http://eecs.oregonstate.edu/~pchiang
ECE472
Computer Architecture
Patrick ChiangTA: Kang-Min Hu
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 2
Is this class for you?
• This class will not be easy– My first quarter of teaching computer architecture at Oregon State
– Assumes good mastery of basic assembly language programming
– What is the class makeup?• ECE 1/2
• CS 1/2
– This is “ECE472”, and emphasizes the hardware side of Comp. Arch.• There is CS472 in Spring 2008 quarter
• Class Breakdown– 5 Homeworks: 10%
– 1 Midterm: 20%
– 1 Project: 30%
– 1 Final: 40%
• Average grade: around B/B+, with some flexibility
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 3
Today: What’s the big picture?
• Syllabus: Given this Thursday
• Start with the C-code
• Do the assembly language
• FIRST: How to evaluate whether a computer is “fast”, or “good”?
– Execution Time (time to run process(s))
– Power
– Cost
– Flexibility (complexity, programmability)
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 4
What do Computer Architects Do?
SoftwareRequirements
Computer Architect
Applications Interfaces
Machine Organization
Measurement &Analysis
ISA
AP
I
Link
I/O C
han
Regs
IR
Technology
The science/art of constructing efficient systems for computing tasks
ECE471: Digital VLSI
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 5
What is Computer Architecture?
• Understanding every level of the complete system:– Software– Compiler– Computer Architecture– VLSI digital circuit design
• For SOC, even analog/mixed-signal design
– Devices
• For a engineer, you must understand “depth” and “breadth”– Everything is related– Must understand every level of the problem to make the right “choices”
• Cannot just black-box and say: “Not my problem. Someone else will solve it.”
• Choice of where you want to go next depends on understanding changes along the entire vertical structure
– How is the technology changing? Are there fundamental shifts?– i.e. multi-core, parallel processing
• Execution Time = ?
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 6
Write Some C Code for Me
• C code
• What does the complier do?– Assembly language
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 7
Now that we have assembly code, how do we evaluate performance?
• Execution time =
• Is execution time the only metric for performance?
• What about power?
• What about cost?
• What about usability/programmability?
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 8
Notice one thing about your C Code: Application Specific
• Where are you running this code?– Laptop
– Desktop
– Cellphone
– Google Server Farm
– Digital Signal Processor
• Each application has completely different fundamentals and
constraints
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 9
Do a DSP Calculation now--
• Write C-code for DSP– i.e. Polygon Rendering for X-box Halo 3
– MP3 Decode
• Write assembly code for this:
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 10
Do a Transaction Processing Code Now--
• Google query--?
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 11
Processor-based Digital Systems
• Systems with a programmable, general-purpose processor– Advantages ??
• Computers are the canonical example– PCs, laptops, workstations, …
• However, most processors are embedded or in servers– Game consoles, PDAs, cell phones, …
– Printers, car electronics system, …
– Web servers, database servers, …
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 12
FUTURE: Why are we going here--?
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 13
Overall System Architecture
• Multiple interacting layers– Term “architecture” used with all of them
• This class focuses on– Hardware architecture
• Memory, interconnect, IO
• Clusters
• Reliability & low power systems
– Hardware-software interaction• Programming for performance
• OS support
• Cluster programming
• Virtual machines & security
Libraries
Application
Processor
Operating System
Drive
rs
VM
SW
Sch
ed
ule
r
VM HW
System Bus
Controller
MainMemory Controller
IO
IO Bus(es)GraphicsHW Controller
Net
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 14
Application: Constraints & Opportunities
• Applications drive machine ‘balance’– Scientific computations
• Floating-point performance
• Main memory bandwidth
– Transaction/web processing• ??
– Multimedia processing• ??
– Embedded control• ??
Architecture concepts typically exploit application behavior
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 15
Applications Change over Time
• Data-sets & memory requirements larger– Cache & memory architecture become more critical
• Standalone networked– IO integration & system software become more critical
• Single task multiple tasks – Parallel architectures become critical
• Limited IO requirements rich IO requirements– 60s: tapes & punch cards
– 70s: character oriented displays
– 80s: video displays, audio, hard disks
– 90s: 3D graphics; networking, high-quality audio
– 00s: real-time video, immersion, …
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 16
Application Properties to Exploit in Computer Design
• Locality in memory/IO references– Programs work on subset of instructions/data at any point in time
– Both spatial and temporal locality
• Parallelism– Data-level (DLP): same operation on every element of a data sequence
– Instruction-level (ILP): independent instructions within sequential program
– Thread-level (TLP): parallel tasks within one program
– Multi-programming: independent programs
– Pipelining
• Predictability– Control-flow direction, memory references, data values
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 17
Technology Trends & Constraints:Yearly Improvement
• Integrated circuits: logic– 60% more devices per chip
– 15% faster devices
– Long wires don’t improve
• Integrated circuits: DRAM– 60% more devices per chip
– 7% reduction in latency
– 14% increase in bandwidth
• Magnetic Disks– 60% to 100% increase in density
• IO/networking– Little improvement in latency
– Large improvements in bandwidth through fast/wide signaling
2001
1998
1995
1992
64x more devices since 19924x faster devices
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 18
Changes in Technology & Applications lead to Changes in Architecture
• 1970s– Multi-chip CPUs
– Semiconductor memory very expensive
– Complex instruction sets (good code density)
– Microcoded control
• 1980s– 5K – 500 K transistors
– Single-chip, pipelined CPUs
– On-chip memory possible
– Simple, hard-wired control
– Simple instruction sets
– Small on-chip caches
• 1990s– 1 M - 64M transistors, 64b CPUs
– Complex control to exploit instruction-level parallelism
– Deep pipelines
– Multi-level caches
• 2000s– 100 M - 5 B transistors
– Slow wires, power consumption, design, complexity, memory latency, IO bottlenecks, …
– Multiprocessors & parallel systems
– Support & programming for parallelism?
– <<your Ph.D. thesis goes here>>
Keeps computer architecture interesting and challenging
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 19
Rules of Thumb in Data Engineeringby J. Gray and Prashant Shenoy
Storage
1. Moore’s Law: Things get 4x denser every three years.
2. You need an extra bit of addressing every 18 months.
3. Storage capacities increase 100x per decade.
4. Storage device throughput increases 10x per decade.
5. Disk data cools 10x per decade.
6. Disk page sizes increase 5x per decade.
7. NearlineTape:OnlineDisk:RAM storage cost ratios are approximately 1:3:300.
8. In ten years RAM will cost what disk costs today.
9. A person can administer a million dollars of disk storage– Disks are replacing tapes as backup devices.
– On random workloads, disk mirroring is preferable to RAID5 parity because it spends disk space (which is plentiful) to save disk accesses (which are precious).
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 20
Metrics of Efficiency
• Desktop computing ($500 - $3K)– Metrics: ??
– Prominent processors: Intel Pentium, AMD Athlon, PowerPC G5
• Server computing ($3K - $1M)– Metrics: ??
– Prominent processors: IBM Power5, Sun UltraSparc, AMD Opteron
• Embedded computing ($10 - $500)– Metrics: ??
– Prominent processors: ARM, MIPS, Motorola 68K, many others
Diversity in requirements leads to diversity in architectures
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 21
Performance Metrics
• Latency or execution time or response time– Wall-clock time to complete a task
– Important if all we have to run is a single or a time-critical time to run
• Bandwidth or throughput or execution rate– Number of tasks completed per unit of time
• Bandwidth = total amount of work / total execution time
– Metric is independent of exact number of tasks executed
– Important when we have many tasks to run
• What about Power? What about Cost? What about Reliability?
Plane
Boeing 747
BAD/Sud Concorde
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 22
Examples
• Latency metric: program execution time in seconds
– Your system architecture can affect all of them• CPI: memory latency, IO latency, …
• CCT: cache organization, …
• IC: OS overhead, …
Cycle
Seconds
ogram
Cycles
ogram
SecondsCPUtime
PrPr
Cycle
Seconds
nInstructio
Cycles
ogram
nsInstructio
Pr
CCTCPIIC
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 23
A is Faster than B?
• Given the CPUtime for machines A and B, A is X times faster than B
means:
• Example, CPUtimeA=3.4sec & CPUtimeB=5.3sec then– A is 5.3/3.4=1.55 times faster than B or 55% faster
• If you start with bandwidth metrics of performance, use inverse ratio
A
B
CPUTime
CPUTimeX
B
A
BandWidth
BandWidthX
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 24
Speedup and Amdahl’s Law
• Speedup = CPUtimeold / CPUtimenew
• Given an optimization x that accelerates fraction fx of program by a
factor of Sx, how much is the overall speedup?
• Lesson’s from Amdhal’s law
– Make common cases fast: as fx→1, speedup→Sx
– But don’t overoptimize common case: as Sx→, speedup→ 1 / (1-fx)
• Speedup is limited by the fraction of the code that can be accelerated
• Uncommon case will eventually become the common one
x
xx
x
xxold
old
new
old
Sf
fSf
fCPUTime
CPUTime
CPUTime
CPUTimeSpeedup
)1(
1
])1[(
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 25
Amdahl’s Law Example
• If Sx=100, what is the overall speedup as a function of fx?
Speedup vs Optimized Fraction
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fraction of Code Optimized
Sp
eed
up
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 26
Historical Trend for Computer Performance
1
10
100
1000
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01
intel 386
intel 486
intel pentium
intel pentium 2
intel pentium 3
intel pentium 4
intel i tanium
Alpha 21064
Alpha 21164
Alpha 21264
Spar c
Super Spar c
Spar c64
Mips
HP P A
P ower P C
AMD K6
AMD K7
Inte
ger
Per
form
ance
55% faster per year
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 27
To Put it Into Perspective
• 1982-2000: computers getting 55% faster per year– Total of 4,000x
– Significant cost improvements as well
• What if other areas showed similar improvement rates?– Cars: 176,000 mph or 64,000 miles/gal
– Airplanes: LA to NY in 5.5sec (MACH 3200)
– Wheat: 320,000 bushels per acre
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 28
Digital System Cost
• Cost is a very important design constraint– Most digital systems are consumer electronic produces
• Cost distribution for $1K PC– Processor board: 37%
• Processor, memory, …
– IO devices: 37%• Hard disk, DVD, monitor, keyboard, …
– Software: 20%
– Cabinet: 6%
• Integrated circuits represent significant part of the system cost– Processor, memory, hard disk controller, graphics chips, networking chip
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 29
Cost of Integrated Circuits
Die_areasity Defect_Den
1 dWafer_yiel YieldDie
d test yielFinal
cost Packagingcost Testing cost Die cost IC
yield DieWafer per Dies
costWafer cost Die
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 30
Chip Cost is a Function of Size
$0.00
$50.00
$100.00
$150.00
$200.00
$250.00
0 2 4 6 8 10 12 14 16 18 20
Chip Size (mm)
Un
pa
ck
ag
ed
Co
st
($)
Chip cost increases roughly with die area4
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 31
Cost – Performance Tradeoff
• The trade-off– Chip cost is primarily a function of die area4
– But bigger dies provide more resources for higher performance
• The goal of a good architect– Find the knee of the performance-cost curve OR
– Get maximum performance for a fixed cost target
Per
form
ance
Cost
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 32
Other Cost Contributors
• Testing cost– Cost/die = (cost/hour x test time) / yield
– Could be $10-$20 or more for complex chips
• IC Packaging– Depends on die size, number of pins, and power dissipation
• Cost of cooling system– <2W no heat-sink, <10W no fan, >100+W liquid/spray cooling
• And most of all, do not forget VOLUME– Cost of a modern IC fabrication facility: >$2B
– Cost of a set of masks for a wafer: $0.5M - $1M
– Design NRE cost: often ~$10M
– Need volume to amortize all this cost…
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 33
Cost Vs Price
• Price is really what your customer cares about
• Price components for a system vendor– Component cost: buying the parts
• 47% of list price for $1K PC
– Direct costs: labor, warranties, dealing with scrap, …• 10% of list price for $1K PC
– Gross margin: company overhead• R&D, marketing, sales, buildings, maintenance , taxes, …
• 19% of list price for $1K PC
– Average discount: plan for volume discounts…• 25% of list price for $1K PC
• As computers become commodity components, price matters a lot!
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 34
Historical Trend for Processor Price
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 35
Summary
• Computer architecture:– Design of efficient systems given the requirements of applications and the
capabilities/constraints of technology
– Need to look a few years ahead with both applications & technology
• Applications– Look for locality, parallelism, and predictability
• Technology – Dealing with latency, power, and reliability are the upcoming challenges
• Performance & cost– Two important efficiency metrics for most systems
– Latency Vs. bandwidth performance metrics
– Cost Vs. price
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 36
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 37
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 38
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 39
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 40
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 41
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 42
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 43
Multiple Processors on Single Chip
• Two processors on single-chip
• Two chips(w/ two processors) in single package
• 16 – 64 – 256 processors on single die– Stream Processors
– Sun Niagara• http://www.ece.ucdavis.edu/~ocin06/talks/ho.pdf
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 44
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 45
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 46
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 47
What does Moore’s Law buy you?
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 48
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 49
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 50
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 51
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 52
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 53
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 54
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 55
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 56
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 57
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 58
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 59
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 60
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 61
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 62
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 63
EE472 – Fall 2007 P. Chiang with slides from C. Kozyrakis (Stanford)
Lecture 1 - 64