View
13
Download
0
Category
Preview:
Citation preview
ECE 154A Introduction to ComputerIntroduction to Computer
ArchitectureDmitri Strukov
Lecture 1Lecture 1
OutlineOutline
• AdminAdmin
• What this class is about?
i i• Prerequisites
• Simple computer
• Performance
• Historical trendsHistorical trends
• Economics
UCSB | ECE 154A | Fall 2013 2
AdminAdminOffice Hours: W 3 pm – 5 pm, HFH 5153
Course load and grading:• ~5 projects (30%)• ~5 HWs (10%)• 2 midterms (15% each)( )• Final (30 %)• up to 5 % extra for participation and attendance
TAs:‐Michael Klachko (M 8 am – 1pm / W 8 am – 12 pm / Th 8 am – 11 am)‐ Joseph McMahan (M 10am – 3 pm / T 9 am – 12 pm / W 10 am – 3 pm / Th 9 am – 12 pm / F 1pm – 5
pm) ‐ Fan Lin F 9 am Rec (M‐Th 10 am – 12 pm / M & W 1 pm – 3 pm)
Website: http://www.ece.ucsb.edu/~strukov/ (to be set up this weekend)
No recitation this week – will have extra at the end of course
UCSB | ECE 154A | Fall 2013 3
Admin: Textbooks
R i d C t O i ti d D i Th
Admin: Textbooks
• Required: Computer Organization and Design: The Hardware/Software Interface, Fourth Edition, Patterson and Hennessy (COD). The third edition is also fine.
• Recommended: Digital Design and Computer Architecture, David and Sarah Harris, 2012 (2nd Ed).Architecture, avid and Sarah Harris, 0 ( d).
• Recommended: Computer Architecture: From Mi t S t B h P h iMicroprocessors to Supercomputers, Behrooz Parhami, 2005.
UCSB | ECE 154A | Fall 2013
• C language manual webpage from Stanford University 4
Major computing platforms
Application Field
Programmable MicroprocessorSpecific Integrated
Circuit
Programmable Gate Array
p
Density, speed Flexibility
In this class, the focus is on the microprocessors only
Chip cost = Non‐recurring engineering cost / volume + cost per chip
What is Computer Architecture?Application
Gap too large to bridgeGap too large to bridge in one step(but there are exceptions, e.g.
ti )
Physics
magnetic compass)
In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information
i li ti ffi i tl i il bl f t i
6
processing applications efficiently using available manufacturing technologies.
UCSB | ECE 154A | Fall 2013
How do we handle complexity?
ECE 154AOperating
Application (ex: browser)
ECE 154ACompiler
OperatingSystem(Mac OSX)
Instruction SetSoftware Assembler
I/O systemProcessorInstruction SetArchitecture
Datapath & Control
MemoryHardware
Digital DesignCircuit DesignTransistors
• Coordination of many levels of abstraction
Dan Garcia7UCSB | ECE 154A | Fall 2013
Levels of RepresentationHigh Level Language
Program (e.g., C)
Compiler
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
ldr r0, [r2]ldr r1, [r2, #4]str r1, [r2]str r0, [r2, #4]
Assembly Language Program (e.g.,ARM)
Compiler
AssemblerMachine Language
Program (ARM)
Assembler
Machine
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
Hardware Architecture Description (e.g., block diagrams)
Machine Interpretation
( g , g )
Architecture Implementation
Logic Circuit Description(Circuit Schematic Diagrams)
Dan Garcia8UCSB | ECE 154A | Fall 2013
Prerequisites: Knowledge ofPrerequisites: Knowledge of
• Digital logic (ECE 152A, 15A)Digital logic (ECE 152A, 15A)– Combinational logic (Logic gates, Critical path) – Sequential logic (Clock cycle time, finite state q g ( y ,machine)
– Basic logic circuits (muxs, registers, ALU, memories)
• Basic programming skills– C language– Procedures, pointers and arrays
UCSB | ECE 154A | Fall 2013 9
Assignment for Next WeekAssignment for Next Week
• HW to be posted this weekend on theHW to be posted this weekend on the prerequisite + Chapter 1 material (due October 7th 11 pm)October 7 11 pm)
• Read chapter 1 from P&H A di C d D 1 D 4 (f )– Appendix C and D.1‐D.4 (for prereq)
UCSB | ECE 154A | Fall 2013 10
Simple ComputerSimple Computer
• Keep data in memoryKeep data in memory
• Program algorithm as a sequence of stepsT i l t i ti d t– Typical step is some operation on data
• Store this sequence in memory
• Execute steps one by one – “Control” circuitry orchestrates steps
– “Datapath” implement steps
UCSB | ECE 154A | Fall 2013 11
Simple Computer Store‐program (Von‐Neumann) computer
Algorithm for F = A x B + C / DMemoryaddresses
g /
Step 1: Temp1 = A x B
Step 2: Temp2 = C / D Control data
Step 3: F = Temp1 + Temp2
Datapathoperation
Read A and B from
Read C and D from
Read temp1 and
Load first instructio
n to
memory, compute temp1, write
temp1 to
Load second instructio
memory, compute temp2, write
temp2 to
Load second instructio
temp2 from
memory, compute F, write F to
UCSB | ECE 154A | Fall 2013 time
n to control
temp1 to memory n to
control
temp2 to memory n to
control
write F to memory
12
Components of Computer
Computer Keyboard, Mouse
Processor
Control(“brain”)
Memory
(where
Devices
Input
Mouse
Disk(where
( brain )
Datapath(“brawn”)
(where programs, data live whenrunning)
Output
programs, data live whennot running)g)
Display, Printer 13UCSB | ECE 154A | Fall 2013
PerformancePerformance
• MetricsMetrics– Execution time per application
Energy per application– Energy per application
– Throughput (# apps executed per unit time)
B h ki• Benchmarking – Intended set of applications (SPEC)
– Geometric average n
n
1iiratio time Execution
UCSB | ECE 154A | Fall 2013 14
PerformancePerformance
Performance = 1 / execution timePerformance 1 / execution time
Execution timeExecution time =
Clock Cycle Time x # Cycles = #Cycles / Clock Rate
UCSB | ECE 154A | Fall 2013 15
Ways to Improve Simple Computer?
Reduce the number of clocks?Memoryaddresses
Reduce clock cycle time?Control data
Datapathoperation
Read A and B from
Read C and D from
Read temp1 and
Load first instructio
n to
memory, compute temp1, write
temp1 to
Load second instructio
memory, compute temp2, write
temp2 to
Load second instructio
temp2 from
memory, compute F, write F to
UCSB | ECE 154A | Fall 2013 time
n to control
temp1 to memory n to
control
temp2 to memory n to
control
write F to memory
16
PerformancePerformance
Performance = 1 / Execution Time e o a ce / ecut o e
Execution Time =Execution Time Clock Cycle Time x # Cycles = #Cycles / Clock Rate
# Cycles = Instruction Count x (Average) Clocks Per Instruction
Execution Time = CCT x IC x CPI = IC x CPI / Clock RateExecution Time CCT x IC x CPI IC x CPI / Clock Rate
UCSB | ECE 154A | Fall 2013 17
CPI Example• Computer A: CCT = 250ps, CPI = 2.0• Computer B: CCT= 500ps, CPI = 1.2p p ,• Same ISA• Which is faster, and by how much?
500psI250ps2 0IACCTACPICount nInstructioATime CPU
A is faster
600psI500ps1 2IBCCTBCPICount nInstructioBTime CPU
500psI250ps2.0I
A is faster…
1.2500psI600psI
ATimeCPUBTime CPU
600psI500ps1.2I
…by this much500psIATime CPU
18UCSB | ECE 154A | Fall 2013
CPI in More Detail• If different instruction classes take different numbers of cyclesnumbers of cycles
n
ii )CountnInstructio(CPICyclesClock
1i
ii )CountnInstructio(CPICycles Clock
Weighted average CPI Weighted average CPI
ni
i C tI t tiCount nInstructioCPI
C tI t tiCycles ClockCPI
1i
i CountnInstructioCountnInstructio
Relative frequencyRelative frequency
19UCSB | ECE 154A | Fall 2013
Amdahl's LawAmdahl s Law
Execution Time = CCT x IC x CPI = IC x CPI / Clock /Rate
Say there are instructions of type A and BSay there are instructions of type A and B
CPI = (IC_A x CPI_A + IC_B x CPI_B) / IC
Improving CPI_A only has limitations
T
Corollary: make the common case fast
UCSB | ECE 154A | Fall 2013
unaffectedaffected
improved Tfactor timprovemen
TT 20
Pitfall: MIPS as a Performance Metric
• MIPS: Millions of Instructions Per SecondDoesn’t account for– Doesn t account for
• Differences in ISAs between computers
• Differences in complexity between instructionsDifferences in complexity between instructions
610tiE ticount nInstructioMIPS
66
6
10CPIrate Clock
CPIcountnInstructiocount nInstructio10timeExecution
6 10CPI10rate Clock
CPIcountnInstructio
CPI varies between programs on a given CPU CPI varies between programs on a given CPU
21UCSB | ECE 154A | Fall 2013
Ways to Improve Simple Computer? • Performance depends on
– Algorithm: affects IC, possibly CPI
Memoryaddresses– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
Control data– Instruction set architecture:
affects IC, CPI, Tc
• What is in datapath?
• Width of datapath?
Datapathoperation
• Width of datapath?
• The number and type of instructions?
• Memory organization?Memory organization?
The BIG Picture 22UCSB | ECE 154A | Fall 2013
Computing Devices NowSensor Nets
Set‐top
CamerasGames
Set top boxes
Media Laptops ServersPlayers
RobotsSmart phones
Routers
Automobiles
phones
24
SupercomputersAutomobiles
UCSB | ECE 154A | Fall 2013
Moore’s LawP di t 2X T i t / hi 2Predicts: 2X Transistors / chip every 2 years
G d Mon an
it (IC)
Gordon MooreIntel CofounderB.S. Cal 1950!
ansistors o
ated
circui
# of tra
integra In 1965, Gordon Moore
predicted that the number oftransistors per chip would
Year
transistors per chip woulddouble every 18 months (1.5years)
en.wikipedia.org/wiki/Moore's_law
years)
26UCSB | ECE 154A | Fall 2013
Technology Scaling Road Map (ITRS)
Year 2004 2006 2008 2010 2012Year 2004 2006 2008 2010 2012
Feature size (nm) 90 65 45 32 22
Intg Capacity (BT) 2 4 6 16 32Intg. Capacity (BT) 2 4 6 16 32
• Fun facts about 45nm transistorsFun facts about 45nm transistors– 30 million can fit on the head of a pin– You could fit more than 2,000 across the width of a human hairhuman hair
– If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car d ld b 1today would cost about 1 cent
Intel Core I7‐ 2600K Sandy Bridge• Launched at 2011
• 1.16 billion
• 64 bit
• 3.4GHz
• 216 mm^2
• 32 nm
• 4 cores
• 8M cache• 32 nm • 8M cache
Power Trends
• In CMOS IC technologyIn CMOS IC technology
FrequencyVoltageload CapacitivePower 2
• The power wall• We can’t reduce voltage further
• We can’t remove more heat29UCSB | ECE 154A | Fall 2013
Solution #1: Single Processor PerformancePerformanceMove to multi-processor
RISC
Frequency ~ V, Power ~ V, With ideal parallelism the power can be decreased for the same execution time UCSB | ECE 154A | Fall 2013
Manufacturing ICsManufacturing ICs
• Yield: proportion of working dies per wafer
31UCSB | ECE 154A | Fall 2013
AMD Opteron X2 WaferAMD Opteron X2 Wafer
• X2: 300mm wafer, 117 chips, 90nm technology
• X4: 45nm technology• X4: 45nm technology
32UCSB | ECE 154A | Fall 2013
Integrated Circuit CostIntegrated Circuit Cost
YieldwaferperDies waferper Costdie per Cost
area Diearea Wafer waferper Dies
YieldwaferperDies
2area/2)) Diearea per (Defects(11Yield
• Nonlinear relation to area and defect rate– Wafer cost and area are fixed
– Defect rate determined by manufacturing process
– Die area determined by architecture and circuit design
33UCSB | ECE 154A | Fall 2013
Recommended