26
1 Structure of Structure of Computer Systems Computer Systems Course 2 Course 2 Computer performance and Computer performance and optimality optimality

1 Structure of Computer Systems Course 2 Computer performance and optimality

Embed Size (px)

Citation preview

11

Structure of Computer Structure of Computer SystemsSystems

Course 2Course 2

Computer performance and Computer performance and optimalityoptimality

22

Performance requirementsPerformance requirements

small execution timesmall execution time short reaction time to external eventsshort reaction time to external events high memory capacity and speedhigh memory capacity and speed many input/output facilities (interfaces)many input/output facilities (interfaces) rich development facilitiesrich development facilities small dimensions and specific shapessmall dimensions and specific shapes predictability, safety and fault tolerancepredictability, safety and fault tolerance small costs: absolute and relative small costs: absolute and relative

33

Optimal computer architectureOptimal computer architecture A compromise between performance parametersA compromise between performance parameters Depends on the purpose and type of the computerDepends on the purpose and type of the computer Computer types (based on purpose):Computer types (based on purpose):

General purpose computersGeneral purpose computers• high performance computers (HPC)high performance computers (HPC)• personal computerspersonal computers• mobile computersmobile computers

Computers for dedicated purposesComputers for dedicated purposes• scientific computingscientific computing• military computers (safety critical and highly reliable)military computers (safety critical and highly reliable)• industrial control and automation (embedded systems)industrial control and automation (embedded systems)• measurement and analysis (e.g. medical devices, intelligent sensors) measurement and analysis (e.g. medical devices, intelligent sensors)

Classification based on performance:- Small, embedded systems

-Control systems, smart sensors- Personal computers

- desktop, laptop, tablet-PC-High performance computers

-Parallel, GRID, cloud

Old classification:Old classification:• mainframes – e.g. IBM 360/370, mainframes – e.g. IBM 360/370, Felix 256Felix 256• minicomputers – PDP11, SUN minicomputers – PDP11, SUN station, Independent, Coralstation, Independent, Coral• microcomputers – microcomputers – microprocessor-based computers microprocessor-based computers (e.g. PC, home computers)(e.g. PC, home computers)

44

Optimal computer architectureOptimal computer architecture

Classification based on architectureClassification based on architecture:: single processor computer single processor computer multiprocessor computers:multiprocessor computers:

• parallel systemsparallel systems multi-core processorsmulti-core processors symmetric and asymmetric parallel systemssymmetric and asymmetric parallel systems

• distributed systemsdistributed systems personal computers and network communication for a specific personal computers and network communication for a specific

(common) purpose(common) purpose GRIDsGRIDs Clouds:Clouds:

• computer as a servicecomputer as a service• storage as a servicestorage as a service• platform as a service platform as a service • software as a service software as a service

55

Optimal computer architectureOptimal computer architecture Optimal performanceOptimal performance parameters for different type of computers: parameters for different type of computers:

HPC – high performance computersHPC – high performance computers::• highly parallel computers – 1.024 – 1.500.000 cores or processorshighly parallel computers – 1.024 – 1.500.000 cores or processors• usage: scientific computing (physics, astronomy, bioinformatics, usage: scientific computing (physics, astronomy, bioinformatics,

chemistry), simulation (fluid’s flow, weather), cryptography chemistry), simulation (fluid’s flow, weather), cryptography • speed: 1-20.000 Tflopsspeed: 1-20.000 Tflops• memory capacity: 1-700 TBytes memory capacity: 1-700 TBytes • communication: InfiniBand (2-300 Gbs), Cray Geminicommunication: InfiniBand (2-300 Gbs), Cray Gemini• power consumption: 10KW- 10MW (Mariselu power station power consumption: 10KW- 10MW (Mariselu power station

~200MW)~200MW)• price: hard to tell price: hard to tell • see top 500 supercomputers see top 500 supercomputers ((

http://www.top500.org/list/2012/06/100/http://www.top500.org/list/2012/06/100/)) no 1 Titan/USA, 560.000 coresno 1 Titan/USA, 560.000 cores no. 2 Sequoia/SUA, 1.572.864 coresno. 2 Sequoia/SUA, 1.572.864 cores no. 3 K computer/ Japan, 750.024 coresno. 3 K computer/ Japan, 750.024 cores

66

HPC – high performance computersHPC – high performance computers

HPC at CERNHPC at CERN architecture: GRIDarchitecture: GRID organization: 3 tiresorganization: 3 tires at least 100.000 processors in 32 at least 100.000 processors in 32

countriescountries serves 5000 scientistsserves 5000 scientists in UTCN: 128 quad-core in UTCN: 128 quad-core

processors, 512 cores processors, 512 cores

Blue Gene - IBMBlue Gene - IBM architecture: parallelarchitecture: parallel 65,536 dual-core processors65,536 dual-core processors 360 teraflop peak speed360 teraflop peak speed

Where is that bit?

1+1=3 ?

77

HPC – high performance computersHPC – high performance computers CG-UTCN – Centrul GRID al UTCNCG-UTCN – Centrul GRID al UTCN

64 processor boards64 processor boards 128 quad-core processors, 128 quad-core processors, 512 cores512 cores 1024 virtual processors (hyper-threading)1024 virtual processors (hyper-threading) storage: 12 Tbytesstorage: 12 Tbytes price: 2.000.000 RONprice: 2.000.000 RON

88

Optimal computer architectureOptimal computer architecture

Optimal performanceOptimal performance parameters for different type of parameters for different type of computerscomputers

PC - personal computersPC - personal computers::• single or multi-core systems – 1-8 cores (1-2 processors)single or multi-core systems – 1-8 cores (1-2 processors)• usage: engineering, accounting, administration, entertainment, usage: engineering, accounting, administration, entertainment,

document processing, communicationdocument processing, communication• speed: 1-200 Gflops speed: 1-200 Gflops • memory capacity: 1-16 GBytes (internal), 0,5-1TBytes memory capacity: 1-16 GBytes (internal), 0,5-1TBytes

(external) (external) • communication: Ethernet (0,1-1 Gbs)communication: Ethernet (0,1-1 Gbs)• power consumption: 400-800 Wpower consumption: 400-800 W• price: 500-1000 USDprice: 500-1000 USD• dimensional types: desktop, laptop, tablet, hand-held dimensional types: desktop, laptop, tablet, hand-held

99

Optimal computer architectureOptimal computer architecture

Optimal performanceOptimal performance parameters for different type of computers parameters for different type of computers Mobile devices:Mobile devices:

• single or multi-core systems – 1-4 cores (1 processors)single or multi-core systems – 1-4 cores (1 processors)

• usage: communication, entertainment, place-holder for PCusage: communication, entertainment, place-holder for PC

• speed: 20-600 Mflops speed: 20-600 Mflops

• memory capacity: 0.5-2 GBytes (internal), memory capacity: 0.5-2 GBytes (internal),

• communication: WiFi, Bluetoth (10-100 Mbs)communication: WiFi, Bluetoth (10-100 Mbs)

• power consumption: limited to the accumulator’s capacitypower consumption: limited to the accumulator’s capacity

• price: 1- 500 USDprice: 1- 500 USD• dimensional limitationsdimensional limitations

1010

Optimal computer architectureOptimal computer architecture Optimal performanceOptimal performance parameters for different type of computers parameters for different type of computers

Dedicated and embedded systemsDedicated and embedded systems• single processor systems – microcontroller, DSP (digital signal single processor systems – microcontroller, DSP (digital signal

processor), MSP (mixed signal processor)processor), MSP (mixed signal processor)

• usage: automation, measurement, sensors, medical devicesusage: automation, measurement, sensors, medical devices

• speed: 1-20 MIPS speed: 1-20 MIPS

• memory capacity: 128-512 bytes (data), 0-32Kbytes (program), 1-memory capacity: 128-512 bytes (data), 0-32Kbytes (program), 1-2Kbyte EEPROM 2Kbyte EEPROM

• communication: serial RS232, CAN, I2C (300-9600 bits/s) communication: serial RS232, CAN, I2C (300-9600 bits/s)

• power consumption: very low (battery powered), with low power power consumption: very low (battery powered), with low power modes (1modes (1μμA-10mA)A-10mA)

• price: 1- 20 USDprice: 1- 20 USD• dimension: very small packages (8, 16, 28, 40 pins)dimension: very small packages (8, 16, 28, 40 pins)

1111

Measuring the performance of a computer – Measuring the performance of a computer – benchmark programsbenchmark programs

Definition 1Definition 1 (wikipedia): a benchmark is the act of (wikipedia): a benchmark is the act of running a computer running a computer programprogram, a set of programs, or other operations, , a set of programs, or other operations, in order to assess the in order to assess the relative performancerelative performance of an object, normally by running a number of of an object, normally by running a number of standard tests and trials against it.standard tests and trials against it.

Definition 2Definition 2: a method of comparing the performance of various : a method of comparing the performance of various computer systems computer systems

Measuring and assessing the performance of a system is not a trivial Measuring and assessing the performance of a system is not a trivial task:task:

some computers/CPUs perform better for some tests and worse for others some computers/CPUs perform better for some tests and worse for others (e.g. good results for image processing but less good for database (e.g. good results for image processing but less good for database applications)applications)

performance should be a weighted average of a number of specific performance should be a weighted average of a number of specific teststests

1212

Benchmark programsBenchmark programsReal programs Real programs

word processing software word processing software user's application software user's application software

Micro-benchmarks Micro-benchmarks Designed to measure the Designed to measure the

performance of a very small and performance of a very small and specific piece of code. specific piece of code.

Kernel Kernel contains codes that perform a contains codes that perform a

specific basic operation specific basic operation normally abstracted from actual normally abstracted from actual

program program popular kernel: Livermore loops popular kernel: Livermore loops

(every loop is a mathematical (every loop is a mathematical operation) operation)

Linpack benchmark (contains Linpack benchmark (contains basic linear algebra subroutines) basic linear algebra subroutines)

results are represented in results are represented in MFLOPS MFLOPS

Component Benchmarks/ micro-Component Benchmarks/ micro-benchmarksbenchmarks

programs designed to measure programs designed to measure performance of a computer's basic performance of a computer's basic components components

automatic detection of computer's automatic detection of computer's hardware parameters like number of hardware parameters like number of registers, cache size, memory latency registers, cache size, memory latency

Synthetic BenchmarksSynthetic Benchmarks Procedure for programming synthetic Procedure for programming synthetic

benchmark: benchmark: • take statistics of all types of take statistics of all types of

operations from many application operations from many application programs programs

• get proportion of each operation get proportion of each operation • write program based on the write program based on the

proportion above proportion above Types of Synthetic Benchmark are: Types of Synthetic Benchmark are:

• DhrystoneDhrystone – integer arithmetic – integer arithmetic• WhetstoneWhetstone – integer and floating – integer and floating

point arithmeticpoint arithmetic

1313

Benchmark programsBenchmark programs Other benchmarksOther benchmarks

I/O benchmarks I/O benchmarks Database benchmarks: to measure the throughput and response Database benchmarks: to measure the throughput and response

times of database management systems (DBMS') times of database management systems (DBMS') Parallel benchmarks: used on machines with multiple cores, Parallel benchmarks: used on machines with multiple cores,

processors or systems consisting of multiple machinesprocessors or systems consisting of multiple machines

Issues regardingIssues regarding good benchmarking:good benchmarking: some processor architectures were designed for best some processor architectures were designed for best

benchmarking results, but with less overall performancebenchmarking results, but with less overall performance many benchmarks concentrate on computations and less on many benchmarks concentrate on computations and less on

other aspects such as: memory access time, input/output other aspects such as: memory access time, input/output operation’s delaysoperation’s delays

benchmarks are not relevant for wide distributed systemsbenchmarks are not relevant for wide distributed systems there is no unique measure of “performance” in computingthere is no unique measure of “performance” in computing

1414

Computing the benchmark resultsComputing the benchmark results Arithmetical mean benchmarkArithmetical mean benchmark

itnAMBn

i

1

1

where:where: ttii – execution time of program “i” from the set of – execution time of program “i” from the set of n test programsn test programs

Weighted arithmetic meanWeighted arithmetic mean

itiwnAMB

n

i

1

*1

where:where: wwii – the weight of program “i” from the set indicating its frequency of – the weight of program “i” from the set indicating its frequency of executionexecution

wwi i chosen so that on a reference computer chosen so that on a reference computer the execution time of each the execution time of each benchmark (program) is equal => NORMALIZATIONbenchmark (program) is equal => NORMALIZATION

1515

Computing the benchmark resultsComputing the benchmark results

Geometrical meanGeometrical mean

n

iitGMB

1

Normalized Geometrical meanNormalized Geometrical mean

n

iitiwGMB

1

*

1616

Computing the benchmark resultsComputing the benchmark results

Effects of normalization:Effects of normalization: the result depends on the machine used as a the result depends on the machine used as a

reference: A, B and Creference: A, B and C

 t on A

(s)t on

B (s)t on C

(s)Normalized to A for A,B and C

Normalized to B for A,B and C

Normalized to C for A,B and C

Program 1 1 10 100 1 10 100 10 1 10 0.01 0.1 1

Program 2 1000 100 10000 1 0,1 10 0.1 1 100 0.1 0.01 1

Arithm. mean 500.5 55 550 1 5,05 55 5.05 1 55 0,055 0,055 1

Geom. mean 31.6 31.6 316.22 1 1 31,6 1 1 31.6 0,031 0,031 1

1717

Conclusions of the previous table:Conclusions of the previous table:

for arithmetic mean: for arithmetic mean: if the reference is computer A:if the reference is computer A:

• A is as fast as A A is as fast as A • B is ~5 times slower than AB is ~5 times slower than A• C is 55 times slower than AC is 55 times slower than A

if the reference is computer B:if the reference is computer B:• A is ~5 times slower than BA is ~5 times slower than B• B is as fast as BB is as fast as B• C is 55 times slower than BC is 55 times slower than B

if the reference is computer Cif the reference is computer C• A is 18 times faster than C A is 18 times faster than C • B is 18 times faster than C B is 18 times faster than C • C is as fast as C C is as fast as C

for geometric mean: for geometric mean: if the reference is computer A:if the reference is computer A:

• A is as fast as A A is as fast as A • B is as fast as AB is as fast as A• C is ~32 times slower than AC is ~32 times slower than A

if the reference is computer B:if the reference is computer B:• A is as fast as BA is as fast as B• B is as fast as BB is as fast as B• C is ~32 times slower than AC is ~32 times slower than A

if the reference is computer Cif the reference is computer C• A is ~32 times faster than C A is ~32 times faster than C • B is ~32 times faster than C B is ~32 times faster than C • C is as fast as C C is as fast as C

1818

Computing the benchmark resultsComputing the benchmark results

AdvantagesAdvantages of geometric meanof geometric mean::• It is independent of the running times of the It is independent of the running times of the

individual programsindividual programs• It It does not matter which machine is used for does not matter which machine is used for

normalizationnormalization DisadvantageDisadvantage of geometric meanof geometric mean::

• It does not It does not predict execution timepredict execution time

1919

Benchmark programsBenchmark programs

Goal: to write a package of programs that Goal: to write a package of programs that best measure the performance of a best measure the performance of a computer systemcomputer system

Solutions:Solutions: real programs – that solve different classical real programs – that solve different classical

problemsproblems synthetic programs – no practical result, but synthetic programs – no practical result, but

preserve the frequency of instructions preserve the frequency of instructions measured in real casesmeasured in real cases

2020

Examples of benchmark programsExamples of benchmark programs

WhetstoneWhetstone synthetic program synthetic program Published in 1976 by the Published in 1976 by the National Physical LaboratoryNational Physical Laboratory (NPL), Great (NPL), Great

BritainBritain preserves the frequency of instructions in scientific and engineering preserves the frequency of instructions in scientific and engineering

applications written in Algol and later in Fortran and Pascalapplications written in Algol and later in Fortran and Pascal floating point instructions have an important rolefloating point instructions have an important role

Dhrystone syntheticDhrystone synthetic program program Published in 1984Published in 1984 preserves the frequency of instructions in system programming (e.g. preserves the frequency of instructions in system programming (e.g.

operating system components) using Ada and C programming languageoperating system components) using Ada and C programming language frequency measurements are publishedfrequency measurements are published no emphasis on FP operationsno emphasis on FP operations

Issues with synthetic benchmarks:Issues with synthetic benchmarks: does not reflect well the needs of a real applicationdoes not reflect well the needs of a real application some computer architectures were optimized for best performance some computer architectures were optimized for best performance

regarding synthetic benchmarks, but with less performance on real regarding synthetic benchmarks, but with less performance on real applicationsapplications

2121

Examples of benchmark programsExamples of benchmark programs

Kernel benchmarkKernel benchmark programs programs based on time-critical components of real applicationsbased on time-critical components of real applications focused on measuring the performance of focused on measuring the performance of

supercomputers running scientific applicationssupercomputers running scientific applications examplesexamples: :

• Livermore LoopsLivermore Loops: : benchmark for parallel computersbenchmark for parallel computers 24 “do” loops caring out different mathematical operations (e.g. 24 “do” loops caring out different mathematical operations (e.g.

solve linear systems, hydrodynamics matrix operations, etc.) solve linear systems, hydrodynamics matrix operations, etc.)

• LinpackLinpack: : performs numerical linear algebraperforms numerical linear algebra

2222

Examples of benchmark programsExamples of benchmark programs SPECSPEC - - Standard Performance Evaluation CorporationStandard Performance Evaluation Corporation

a non-profit international organization focused on a non-profit international organization focused on developing standard tools for measuring the developing standard tools for measuring the performance of computer systemsperformance of computer systems

www.spec.orgwww.spec.org develops standard sets of benchmarks based on develops standard sets of benchmarks based on

real applications real applications benchmark sets contain source codesbenchmark sets contain source codes there are also tools for generating performance there are also tools for generating performance

reportsreports

2323

Examples of benchmark programsExamples of benchmark programs Evolution of SPEC benchmark standards:Evolution of SPEC benchmark standards:

SPEC89SPEC89• The first benchmark setThe first benchmark set, , released inreleased in 1989 1989• benchmark value: benchmark value: geometric meangeometric mean of execution times normalized to of execution times normalized to

the the VAX‑11/780VAX‑11/780 computercomputer SPEC92SPEC92

• contains different benchmarks for integer contains different benchmarks for integer ((SPECINTSPECINT) ) and and floating‑point instructionsfloating‑point instructions ( (SPECFPSPECFP))

CPU95CPU95, , CPU2000CPU2000 Current version:Current version: CPU2006 CPU2006 Next version: Next version: CPUv6CPUv6

SPECSPEC consists of three interest groups consists of three interest groups Open Systems GroupOpen Systems Group ( (OSGOSG): ): Component and system level Component and system level

benchmarks benchmarks High Performance GroupHigh Performance Group ( (HPGHPG): Benchmarks for high-performance ): Benchmarks for high-performance

computingcomputing Graphics Performance Characterization GroupGraphics Performance Characterization Group ( (GPCGGPCG): ):

Benchmarks for graphics subsystemsBenchmarks for graphics subsystems

2424

Examples of benchmark programsExamples of benchmark programs

Details for CPU2006:Details for CPU2006: contains two collectionscontains two collections::

• CINT200CINT20066: : integer computationsinteger computations• CFP200CFP20066: : floating-point computationsfloating-point computations

it can measure:it can measure:• speed: speed: SPECSPEC ratio ratio - - the time to execute one copy of the the time to execute one copy of the

benchmarkbenchmark • rate: rate: SPEC rate - the number of jobs that can be executed in a SPEC rate - the number of jobs that can be executed in a

given time (e.g. 24h)given time (e.g. 24h) results are combined with geometric meanresults are combined with geometric mean normalization is made on a normalization is made on a Sun Microsystems Ultra 5/10Sun Microsystems Ultra 5/10

workstation, workstation, with a with a SPARCSPARC processor; for this system the result processor; for this system the result of the measurement is 1of the measurement is 1

2525

Details for CPU2006Details for CPU2006

Examples of integer benchmarksExamples of integer benchmarks 401.bzip2:401.bzip2: compression program based on compression program based on

bzip2bzip2 403.gcc:403.gcc: C compiler based on gcc 3.2 C compiler based on gcc 3.2 445.gobmk:445.gobmk: plays the game of go plays the game of go 458.sjeng:458.sjeng: chess program chess program 462.libquantum:462.libquantum: library for the simulation of a library for the simulation of a

quantum computerquantum computer 473.astar:473.astar: path-finding library for 2D maps (A* path-finding library for 2D maps (A*

algorithm)algorithm)

2626

Details for CPU2006Details for CPU2006 Example floating-point benchmarksExample floating-point benchmarks

435.gromacs:435.gromacs: simulates the Newtonian equations of simulates the Newtonian equations of motion for particlesmotion for particles

444.namd:444.namd: simulates bio-molecular systems simulates bio-molecular systems 459.GemsFDTD:459.GemsFDTD: solves the Maxwell equations in 3D solves the Maxwell equations in 3D

in the time domainin the time domain 465.tonto:465.tonto: quantum chemistry package quantum chemistry package 481.wrf:481.wrf: weather forecasting weather forecasting 482.sphinx3:482.sphinx3: speech recognition speech recognition

look on the Internet for the results of your look on the Internet for the results of your processorprocessor