27
CS 61C: Great Ideas in Computer Architecture Performance Iron Law, Amdahl’s Law Instructors: Nicholas Weaver& Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/

CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

CS61C:GreatIdeasinComputerArchitecture

PerformanceIronLaw,Amdahl’sLaw

Instructors:NicholasWeaver&VladimirStojanovichttp://inst.eecs.berkeley.edu/~cs61c/

Page 2: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

New-SchoolMachineStructures(It’sabitmorecomplicated!)

• ParallelRequestsAssigned tocomputere.g.,Search“Katz”

• ParallelThreadsAssigned tocoree.g.,Lookup,Ads

• ParallelInstructions>[email protected].,5pipelined instructions

• ParallelData>1dataitem@one timee.g.,Addof4pairsofwords

• HardwaredescriptionsAllgates@onetime

• ProgrammingLanguages2

SmartPhone

WarehouseScale

Computer

SoftwareHardware

HarnessParallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

CacheMemory

Core

InstructionUnit(s) FunctionalUnit(s)

A3+B3A2+B2A1+B1A0+B0

Howdoweknow?

Page 3: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

WhatisPerformance?

• Latency(orresponsetimeorexecutiontime)– Timetocompleteonetask

• Bandwidth(orthroughput)– Taskscompletedperunittime• Ifyouhavesufficientindependenttasks,youcanalwaysthrowmoremoneyattheproblem:Throughput/$oftenamoreimportantmetricthanjustthroughput

3

Page 4: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

CloudPerformance:WhyApplicationLatencyMatters

• Keyfigureofmerit:applicationresponsiveness– Longerthedelay,thefewertheuserclicks,thelesstheuserhappiness,andthelowertherevenueperuser

4

Page 5: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

DefiningCPUPerformance• WhatdoesitmeantosayXisfasterthanY?

• Ferrarivs.SchoolBus?• 2013Ferrari599GTB– 2passengers,quartermilein10secs

• 2013TypeDschoolbus– 50passengers,quartermilein20secs

• ResponseTime (Latency):e.g.,timetotravel¼mile• Throughput (Bandwidth): e.g.,passenger-miin1hour

5

Page 6: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

DefiningRelativeCPUPerformance• PerformanceX =1/ProgramExecutionTimeX• PerformanceX >PerformanceY =>1/ExecutionTimeX>1/ExecutionTimey=>ExecutionTimeY >ExecutionTimeX

• ComputerXisNtimesfasterthanComputerYPerformanceX /PerformanceY =NorExecutionTimeY /ExecutionTimeX=N

• BustoFerrariperformance:– Program:Transfer1000passengersfor1mile– Bus:3,200sec,Ferrari:40,000sec

6

Page 7: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

MeasuringCPUPerformance

• Computersuseaclocktodeterminewheneventstakesplacewithinhardware

• Clockcycles: discretetimeintervals– akaclocks,cycles,clockperiods,clockticks

• Clockrateorclockfrequency: clockcyclespersecond(inverseofclockcycletime)

• 3GigaHertz clockrate=>clockcycletime=1/(3x109)seconds

clockcycletime=333picoseconds(ps)

7

Page 8: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

CPUPerformanceFactors

• TodistinguishbetweenprocessortimeandI/O,CPUtimeistimespentinprocessor

• CPU Time/Program= Clock Cycles/Program

x Clock Cycle Time

• OrCPU Time/Program= Clock Cycles/Program ÷ Clock Rate

8

Page 9: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

IronLawofPerformancebyEmer andClark

• A programexecutesinstructions• CPU Time/Program

= Clock Cycles/Program x Clock Cycle Time= Instructions/Program

x Average Clock Cycles/Instruction x Clock Cycle Time

• 1st termcalledInstructionCount• 2nd termabbreviatedCPIforaverageClockCyclesPerInstruction

• 3rdtermis1/Clockrate

9

Page 10: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

RestatingPerformanceEquation• Time= Seconds

ProgramInstructions Clockcycles SecondsProgram Instruction ClockCycle

10

××=

Page 11: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

WhatAffectsEachComponent?A)InstructionCount,B)CPI,C)ClockRate

AffectsWhat?(click inletterofcomponentnotaffected)

Algorithm

ProgrammingLanguageCompiler

InstructionSet Architecture11

Page 12: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

WhatAffectsEachComponent?InstructionCount,CPI,ClockRate

AffectsWhat?Algorithm InstructionCount,

CPIProgrammingLanguage

InstructionCount,CPI

Compiler InstructionCount,CPI

InstructionSetArchitecture

InstructionCount,ClockRate,CPI

12

Page 13: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

Clickers

• Whichcomputerhasthehighestperformanceforagivenprogram?

13

Computer Clockfrequency

Clock cyclesperinstruction

#instructionsperprogram

A 1GHz 2 1000

B 2GHz 5 800

C 500MHz 1.25 400

D 5GHz 10 2000

Page 14: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

WorkloadandBenchmark

• Workload: Setofprogramsrunonacomputer– Actualcollectionofapplicationsrunormadefromrealprogramstoapproximatesuchamix

– Specifiesprograms,inputs,andrelativefrequencies• Benchmark:Programselectedforuseincomparingcomputerperformance– Benchmarksformaworkload– Usuallystandardizedsothatmanyusethem

14

Page 15: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

SPEC(SystemPerformanceEvaluationCooperative)• ComputerVendorcooperativeforbenchmarks,startedin1989

• SPECCPU2006– 12IntegerPrograms– 17Floating-PointPrograms

• Oftenturnintonumberwherebiggerisfaster• SPECratio:referenceexecutiontimeonoldreferencecomputerdividebyexecutiontimeonnewcomputertogetaneffectivespeed-up

15

Page 16: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

SPECINT2006onAMDBarcelonaDescription

Instruc-tion

Count (B)CPI

Clock cycle

time (ps)

Execu-tion

Time (s)

Refer-ence

Time (s)

SPEC-ratio

Interpreted string processing 2,118 0.75 400 637 9,770 15.3Block-sorting compression 2,389 0.85 400 817 9,650 11.8GNU C compiler 1,050 1.72 400 724 8,050 11.1Combinatorial optimization 336 10.0 400 1,345 9,120 6.8Go game 1,658 1.09 400 721 10,490 14.6Search gene sequence 2,783 0.80 400 890 9,330 10.5Chess game 2,176 0.96 400 837 12,100 14.5Quantum computer simulation 1,623 1.61 400 1,047 20,720 19.8Video compression 3,102 0.80 400 993 22,130 22.3Discrete event simulation library 587 2.94 400 690 6,250 9.1Games/path finding 1,082 1.79 400 773 7,020 9.1XML parsing 1,058 2.70 400 1,143 6,900 6.016

Page 17: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

17

SummarizingPerformance…

Clickers:Whichsystemisfaster?

System Rate(Task1) Rate(Task2)

A 10 20

B 20 10

A:SystemAB:SystemBC:SameperformanceD:Unanswerablequestion!

Page 18: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

18

… DependsWho’sSellingSystem Rate(Task1) Rate(Task2)

A 10 20

B 20 10

Average

15

15Averagethroughput

System Rate(Task1) Rate(Task2)

A 0.50 2.00

B 1.00 1.00

Average

1.25

1.00ThroughputrelativetoB

System Rate(Task1) Rate(Task2)

A 1.00 1.00

B 2.00 0.50

Average

1.00

1.25ThroughputrelativetoA

Page 19: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

SummarizingSPECPerformance

• Variesfrom6xto22xfasterthanreferencecomputer

• Geometricmeanofratios:N-th rootofproductofNratios– GeometricMeangivessamerelativeanswernomatterwhatcomputerisusedasreference

• GeometricMeanforBarcelonais11.7

19

Page 20: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

Administrivia

• Midterm2intheeveningnextMonday• Project2.1gradesin

20

Page 21: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

BigIdea:Amdahl’s(Heartbreaking)Law• SpeedupduetoenhancementEis

Speedupw/E= ----------------------Exectimew/oEExectimew/E

• SupposethatenhancementEacceleratesafractionF(F<1)ofthetaskbyafactorS(S>1)andtheremainderofthetaskisunaffected

ExecutionTimew/E=

Speedupw/E=21

ExecutionTimew/oE´[(1-F)+F/S]

1/[(1-F)+F/S]

Page 22: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

BigIdea:Amdahl’sLaw

22

Speedup=1(1- F)+F

SNon-speed-uppart Speed-uppart

10.5+0.5

2

10.5+0.25

= = 1.33

Example:theexecutiontimeofhalfoftheprogramcanbeacceleratedbyafactorof2.Whatistheprogramspeed-upoverall?

Page 23: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

Example#1:Amdahl’sLaw

• Consideranenhancementwhichruns20timesfasterbutwhichisonlyusable25%ofthetime

Speedupw/E=1/(.75+.25/20)=1.31

• Whatifitsusableonly15%ofthetime?Speedupw/E=1/(.85+.15/20)=1.17

• Amdahl’sLawtellsusthattoachievelinearspeedupwith100processors,noneoftheoriginalcomputationcanbescalar!

• Togetaspeedupof90from100processors,thepercentageoftheoriginalprogramthatcouldbescalarwouldhavetobe0.1%orless

Speedupw/E=1/(.001+.999/100)=90.9923

Speedupw/E=1/ [(1-F)+F/S]

Page 24: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

24

Iftheportionoftheprogramthatcanbeparallelizedissmall,thenthespeedupislimited

Thenon-parallelportionlimitstheperformance

Page 25: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

StrongandWeakScaling• Togetgoodspeeduponaparallelprocessorwhilekeepingtheproblemsizefixedisharderthangettinggoodspeedupbyincreasingthesizeoftheproblem.– Strongscaling:whenspeedupcanbeachievedonaparallelprocessorwithoutincreasingthesizeoftheproblem

– Weakscaling:whenspeedupisachievedonaparallelprocessorbyincreasingthesizeoftheproblemproportionallytotheincreaseinthenumberofprocessors

• Loadbalancingisanotherimportantfactor:everyprocessordoingsameamountofwork– Justoneunitwithtwicetheloadofotherscutsspeedupalmostinhalf

25

Page 26: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

Clickers/PeerInstruction

26

Supposeaprogramspends80%ofitstimeinasquarerootroutine.Howmuchmustyouspeedupsquareroottomaketheprogramrun5timesfaster?

A:5B:16C:20D:100E:Noneoftheabove

Speedupw/E=1/ [(1-F)+F/S]

Page 27: CS 61C: Great Ideas in Computer Architecturecs61c/sp16/lec/25/2016Sp-CS61C-L2… · (clickin letter of component not affected) Algorithm Programming Language Compiler Instruction

AndInConclusion,…

27

• Time(seconds/program)ismeasureofperformanceInstructions Clockcycles SecondsProgram Instruction ClockCycle

• Floating-pointrepresentationsholdapproximationsofrealnumbersinafinitenumberofbits

××=