64
Dr.Khaled Kh. Sharaf Faculty Of Computers And Information Technology Second Term 2019- 2020 Computer Architecture

02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Dr.Khaled Kh. Sharaf

Faculty Of Computers

And Information

Technology

Second Term

2019- 2020

Computer Architecture

Page 2: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Chapter 2:

Computer Evolution and Performance

Computer Architecture

Page 3: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Computer Architecture

LEARNING OBJECTIVES

1. A Brief History of Computers.

2. The Evolution of the Intel x86 Architecture

3. Embedded Systems and the ARM

4. Performance Assessment

Page 4: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

1. A BRIEF HISTORY OF COMPUTERS

The First Generation: Vacuum Tubes

Electronic Numerical Integrator And Computer (ENIAC)

- Designed and constructed at the University of Pennsylvania, was

the world’s first general purpose

- Started 1943 and finished 1946

- Decimal (not binary) - 20 accumulators of 10 digits

- Programmed manually - 18,000 vacuum tubes

by switches

- 30 tons -15,000 square feet

- 140 kW power consumption - 5,000 additions per second

Computer Architecture

Page 5: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The First Generation: Vacuum Tubes

VON NEUMANN MACHINE

• Stored Program concept

• Main memory storing programs and data

• ALU operating on binary data

• Control unit interpreting instructions from memory and executing

• Input and output equipment operated by control unit

• Princeton Institute for Advanced Studies

• IAS

• Completed 1952

Computer Architecture

Page 6: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Structure of von Neumann machine

Computer Architecture

Structure of the IAS Computer

Page 7: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

IAS - details

• 1000 x 40 bit words

• Binary number

• 2 x 20 bit instructions

Set of registers (storage in CPU) 1

• Memory buffer register (MBR): Contains a word to be stored in

memory or sent to the I/O unit, or is used to receive a word from

memory or from the I/O unit.

• • Memory address register (MAR): Specifies the address in

memory of the word to be written from or read into the MBR.

• • Instruction register (IR): Contains the 8-bit opcode instruction

being executed.

Computer Architecture

Page 8: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

IAS - details

Set of registers (storage in CPU) 2

• Instruction buffer register (IBR): Employed to hold temporarily the

right hand instruction from a word in memory.

• Program counter (PC): Contains the address of the next instruction

pair to be fetched from memory.

• Accumulator (AC) and multiplier quotient (MQ): Employed to hold

temporarily operands and results of ALU operations.

Computer Architecture

Page 9: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Commercial Computers

The 1950s saw the birth of the computer industry with two companies,

Sperry and IBM, dominating the marketplace

The UNIVAC I (Universal Automatic Computer)

was the first successful commercial computer. It was intended for both

scientific and commercial applications

• Late 1950s - UNIVAC II

• Faster

• More memory

Computer Architecture

Page 10: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

IBM

• Punched-card processing equipment

• 1953 - the 701

• IBM’s first stored program computer

• Scientific calculations

• 1955 - the 702

• Business applications

• Lead to 700/7000 series

Computer Architecture

Page 11: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Second Generation: Transistors

• The second generation saw the introduction of more complex arithmetic

and logic units and control units, the

• Use of high-level programming languages, and the provision of system

software with the computer.

• system software provided the ability to

• load programs,

move data to peripherals, and

libraries to perform common computations, similar to what modern

OSes like Windows and Linux do.

• Literally - “small electronics”

• A computer is made up of gates, memory cells and interconnections

• These can be manufactured on a semiconductor

• e.g. silicon wafer

Computer Architecture

Page 12: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Transistors

• Replaced vacuum tubes

• Smaller

• Cheaper

• Less heat dissipation

• Solid State device

• Made from Silicon (Sand)

• Invented 1947 at Bell Labs

• William Shockley et al.

Computer Architecture

Page 13: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Transistor Based Computers

• Second generation machines

• NCR & RCA produced small transistor machines

• IBM 7000

• DEC – 1957 “Digital Equipment Corporation”

• Produced PDP-1

Computer Architecture

Page 14: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Third Generation: Integrated Circuits

Microelectronics A single, self-contained transistor is called a discrete component.

Throughout the 1950s and early 1960s, electronic equipment was

composed largely of discrete components— transistors, resistors,

capacitors, and so on.

Microelectronics means, literally, “small electronics.” Since the

beginnings of digital electronics and the computer industry, there has been

a persistent and consistent trend toward the reduction in size of digital

electronic circuits.

Computer Architecture

Page 15: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Computer Architecture

Relationship among

Wafer, Chip, and Gate

Page 16: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Moore’s Law

• Increased density of components on chip

• Gordon Moore – co-founder of Intel

• Number of transistors on a chip will double every year

• Since 1970’s development has slowed a little

• Number of transistors doubles every 18 months

• Cost of a chip has remained almost unchanged

• Higher packing density means shorter electrical paths, giving higher

performance

• Smaller size gives increased flexibility

• Reduced power and cooling requirements

• Fewer interconnections increases reliability

Computer Architecture

Page 17: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

consequences of Moore’s law

1. The cost of a chip has remained virtually unchanged during this

period of rapid growth in density. This means that the cost of

computer logic and memory circuitry has fallen at a dramatic rate.

2. Because logic and memory elements are placed closer together on

more densely packed chips, the electrical path length is shortened,

increasing operating speed.

3. The computer becomes smaller, making it more convenient to

place in a variety of environments.

4. There is a reduction in power and cooling requirements.

5. The interconnections on the integrated circuit are much more

reliable than solder connections. With more circuitry on each chip,

there are fewer interchip connections

Computer Architecture

Page 18: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Later Generations

• Beyond the third generation there is less general agreement on

defining generations of computers. Table 2.2 suggests that there have

been a number of later generations, based on advances in integrated

circuit technology.

• With the introduction of large-scale integration (LSI), more than 1000

components can be placed on a single integrated circuit chip.

• Very-large-scale integration (VLSI) achieved more than 10,000

components per chip, while current ultra-large-scale integration (ULSI)

chips can contain more than one billion components.

• SEMICONDUCTOR MEMORY The first application of integrated circuit

technology to computers was construction of the processor (the control

unit and the arithmetic and logic unit) out of integrated circuit chips. But

it was also found that this same technology could be used to construct

memories.

Computer Architecture

Page 19: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Later Generations

• SEMICONDUCTOR MEMORY The first application of integrated circuit

technology to computers was construction of the processor (the control

unit and the arithmetic and logic unit) out of integrated circuit chips. But

it was also found that this same technology could be used to construct

memories.

• Since 1970, semiconductor memory has been through 13 generations:

1K, 4K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of

this writing, 16 Gbits on a single chip (1K = 210, 1M = 220, 1G = 230).

Each generation has provided four times the storage density of the

previous generation, accompanied by declining cost per bit and

declining access time.

Computer Architecture

Page 20: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Later Generations

MICROPROCESSORS Just as the density of elements on memory chips

has continued to rise, so has the density of elements on processor chips.

As time went on, more and more elements were placed on each chip, so

that fewer and fewer chips were needed to construct a single computer

processor.

Computer Architecture

Page 21: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Generations of Computer

• Vacuum tube - 1946-1957

• Transistor - 1958-1964

• Small scale integration - 1965

• Up to 100 devices on a chip

• Medium scale integration - to 1971

• 100-3,000 devices on a chip

• Large scale integration - 1971-1977

• 3,000 - 100,000 devices on a chip

• Very large scale integration - 1978 -1991

• 100,000 - 100,000,000 devices on a chip

• Ultra large scale integration – 1991 -

• Over 100,000,000 devices on a chip

Computer Architecture

Page 22: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Generations of Computer

Computer Architecture

Page 23: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

x86 Evolution (1) • 8080

• first general purpose microprocessor

• 8 bit data path

• Used in first personal computer – Altair

• 8086 – 5MHz – 29,000 transistors

• much more powerful

• 16 bit

• instruction cache, prefetch few instructions

• 8088 (8 bit external bus) used in first IBM PC

• 80286

• 16 Mbyte memory addressable

• up from 1Mb

• 80386

• 32 bit

• Support for multitasking

• 80486

• sophisticated powerful cache and instruction pipelining

• built in maths co-processor

Computer Architecture

Page 24: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

x86 Evolution (2)

• Pentium

• Superscalar

• Multiple instructions executed in parallel

• Pentium Pro

• Increased superscalar organization

• Aggressive register renaming

• branch prediction

• data flow analysis

• speculative execution

• Pentium II

• MMX technology

• graphics, video & audio processing

• Pentium III

• Additional floating point instructions for 3D graphics

Computer Architecture

Page 25: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

x86 Evolution (3)

• Pentium 4

• Note Arabic rather than Roman numerals

• Further floating point and multimedia enhancements

• Core

• First x86 with dual core

• Core 2

• 64 bit architecture

• Core 2 Quad – 3GHz – 820 million transistors

• Four processors on chip

• x86 architecture dominant outside embedded systems

• Organization and technology changed dramatically

• Instruction set architecture evolved with backwards compatibility

• ~1 instruction per month added

• 500 instructions available

• See Intel web pages for detailed information on processors

Computer Architecture

Page 26: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Embedded Systems ARM

• Embedded system. A combination of computer hardware and software,

and perhaps additional mechanical or other parts, designed to perform a

dedicated function. In many cases, embedded systems are part of a

larger system or product, as in the case of

• an antilock.

• braking system in a car.

• ARM evolved from RISC design

• Used mainly in embedded systems

• Used within product

• Not general purpose computer

• Dedicated function

Computer Architecture

Page 27: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Embedded Systems Requirements

• Different sizes

• Different constraints, optimization, reuse

• Different requirements

• Safety, reliability, real-time, flexibility, legislation

• Lifespan

• Environmental conditions

• Static v dynamic loads

• Slow to fast speeds

• Computation v I/O intensive

• Descrete event v continuous dynamics

Computer Architecture

Page 28: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Possible Organization of an Embedded System

Computer Architecture

Page 29: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

ARM Evolution

• Designed by ARM Inc., Cambridge, England

• Licensed to manufacturers

• High speed, small die, low power consumption

• PDAs, hand held games, phones

• E.g. iPod, iPhone

• Acorn produced ARM1 & ARM2 in 1985 and ARM3 in 1989

• Acorn, VLSI and Apple Computer founded ARM Ltd.

Computer Architecture

Page 30: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

ARM Systems Categories

• Embedded real time

• Application platform

• Linux, Palm OS, Symbian OS, Windows mobile

• Secure applications

Computer Architecture

Page 31: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

What do we measure?

Define performance….

Computer Architecture

Page 32: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598

Boeing 747 470 4150 610

BAC/Sud Concorde 132 4000 1350

Douglas DC-8-50 146 8720 544

Define performance….

• How much faster is the Concorde compared to the 747?

• How much bigger is the Boeing 747 than the Douglas DC-8?

• So which of these airplanes has the best performance?!

When trying to choose among different computers, performance is an

important attribute. Accurately measuring and comparing different

computers is critical to purchasers and therefore to designers.

Computer Architecture

Page 33: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Defining Performance

we can define computer performance in several different ways.

• If you were running a program on two different desktop computers,

you’d say that the faster one is the desktop computer that gets the

job done first.

• If you were running a datacenter that had several servers running

jobs submitted by many users, you’d say that the faster computer

was the one that completed the most jobs during a day.

Computer Architecture

Page 34: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Defining Performance: TIME, TIME, TIME!!!

• Response Time (elapsed time, latency,):

• how long does it take for my job to run?

• how long does it take to execute (start to

finish) my job?

• how long must I wait for the database query?

• Throughput:

• how many jobs can the machine run at once?

• what is the average execution rate?

• how much work is getting done?

• If we upgrade a machine with a new processor what

do we increase?

• If we add a new machine to the lab what do we increase?

Individual user concerns…

Systems manager

concerns…

Computer Architecture

Page 35: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Defining Performance

If we upgrade a machine with a new processor what do we increase?

- both response time and throughput are improved.

If we add a new machine to the lab what do we increase?

- case 2, no one task gets work done faster, so only throughput

increases.

Thus, in many real computer systems, changing either execution time or

throughput often affects the other.

Computer Architecture

Page 36: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Execution Time

• Elapsed Time

• counts everything (disk and memory accesses, waiting for I/O, running other

programs, etc.) from start to finish

• a useful number, but often not good for comparison purposes

elapsed time = CPU time + wait time (I/O, other programs, etc.)

• CPU time

• doesn't count waiting for I/O or time spent running other programs

• can be divided into user CPU time and system CPU time (OS calls)

CPU time = user CPU time + system CPU time

elapsed time = user CPU time + system CPU time + wait time

• Our focus: user CPU time (CPU execution time or, simply, execution time)

• time spent executing the lines of code that are in our program

Computer Architecture

Page 37: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Execution Time

• For some program running on machine X:

PerformanceX = 1 / Execution timeX

• X is n times faster than Y means:

PerformanceX / PerformanceY = n

• execution time on Y is n times longer than it is on X:

Execution timey / Execution timeX = n

Computer Architecture

Page 38: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Execution Time

Relative Performance

If computer A runs a program in 10 seconds and computer B runs the

same program in 15 seconds, how much faster is A than B?

We know that A is n times faster than B if

Thus the performance ratio is 15 / 10 = 1.5

A is therefore 1.5 times faster than B.

Execution timeB / Execution timeA = n

Computer Architecture

Page 39: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Clock Cycles

• Instead of reporting execution time in seconds, we often use cycles. In modern

computers hardware events progress cycle by cycle: in other words, each event,

e.g., multiplication, addition, etc., is a sequence of cycles

• Clock ticks indicate start and end of cycles:

• cycle time = time between ticks = seconds per cycle

• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec, 1 MHz. = 106

cycles/sec)

• Example: A 200 Mhz. clock has a

cycle time

time

seconds

program

cycles

program

seconds

cycle

1

200 106 109 5 nanoseconds

cycle

tick

tick

clock period:

The length of

each clock cycle.

Computer Architecture

Page 40: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Measuring Performance

Time is the measure of computer performance: the computer that

performs the same amount of work in the least time is the fastest.

Program execution time is measured in seconds per program.

The time can be defined in different ways, depending on what we count

wall clock time, response time, or elapsed time: These terms mean

the total time to complete a task, including:

- disk accesses,

- memory accesses,

- input/output (I/O) activities,

- operating system overhead—everything.

Computer Architecture

Page 41: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Measuring Performance

In such cases, the system may try to optimize throughput rather than

attempt to minimize the elapsed time for one program.

CPU execution time or simply CPU time is the actual time the CPU

spends computing for a specific task and does not include time spent

waiting for I/O or running other programs.

user CPU time is the CPU time spent in a program itself.

system CPU time is the CPU time spent in the operating system

performing tasks on behalf of the program.

Computer Architecture

Page 42: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Measuring Performance

Because it is often hard to assign responsibility for operating system

activities to one user program rather than another, And because of the

functionality differences among operating systems.

The differentiating between system and user CPU time is difficult to

do accurately.

For consistency, we maintain a distinction between performance based

on elapsed time and that based on CPU execution time.

We will use the term system performance to refer to elapsed time on

an unloaded system and CPU performance to refer to user CPU time.

Computer Architecture

Page 43: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Understanding Program Performance

Different applications are sensitive to different aspects of the

performance of a computer system.

Many applications, especially those running on servers, depend as much

on I/O performance, which, in turn, relies on both hardware and

software.

Total elapsed time measured by a wall clock is the measurement of

interest.

In some application environments, the user may care about throughput,

response time, or a complex combination.

Computer Architecture

Page 44: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Measuring Performance

To improve the performance of a program, one must have a clear

definition of what performance metric.

Almost all computers are constructed using a clock that determines when

events take place in the hardware.

These discrete time intervals are called clock cycles.

clock cycle is the time for one clock period, usually of the processor

clock, which runs at a constant rate.

clock period: The length of each clock cycle.

clock rate: The speed at which the processor execute instruction.

Computer Architecture

Page 45: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Performance Equation I

• So, to improve performance one can either:

• reduce the number of cycles for a program, or

• reduce the clock cycle time, or, equivalently,

• increase the clock rate

seconds

program

cycles

program

seconds

cycle

CPU execution time CPU clock cycles Clock cycle time

for a program for a program =

equivalently

Computer Architecture

Page 46: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

CPU Performance and Its Factors

Alternatively, because clock rate and clock cycle time are inverses,

CPU execution time for a program = CPU clock cycles for a program

Clock rate

Computer Architecture

Page 47: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

How many cycles are required for a program?

• Could assume that # of cycles = # of instructions

time

1st

instr

uction

2n

d instr

uction

3rd

instr

uction

4th

5th

6th

...

This assumption is incorrect! Because:

Different instructions take different amounts of time (cycles)

Why…?

Computer Architecture

Page 48: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

How many cycles are required for a program?

• Multiplication takes more time than addition

• Floating point operations take longer than integer ones

• Accessing memory takes more time than accessing registers

• Important point: changing the cycle time often changes the

number of cycles required for various instructions because it

means changing the hardware design. More later…

time

Computer Architecture

Page 49: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Example

• Our favorite program runs in 10 seconds on computer A, which has a

2Ghz. clock.

• We are trying to help a computer designer build a new machine B, that

will run this program in 6 seconds. The designer can use new (or

perhaps more expensive) technology to substantially increase the clock

rate, but has informed us that this increase will affect the rest of the CPU

design, causing machine B to require 1.2 times as many clock cycles as

machine A for the same program.

• What clock rate should we tell the designer to target?

Computer Architecture

Page 50: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

CPU Performance and Its Factors

Computer Architecture

Page 51: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Terminology

• A given program will require:

some number of instructions (machine instructions)

some number of cycles

some number of seconds

• We have a vocabulary that relates these quantities:

• cycle time (seconds per cycle)

• clock rate (cycles per second)

• (average) CPI (cycles per instruction)

• a floating point intensive application might have a higher average CPI

• MIPS (millions of instructions per second)

• this would be higher for a program using simple instructions

Computer Architecture

Page 52: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Performance Measure

• Performance is determined by execution time

• Do any of these other variables equal performance?

• # of cycles to execute program?

• # of instructions in program?

• # of cycles per second?

• average # of cycles per instruction?

• average # of instructions per second?

• Common pitfall : thinking one of the variables is indicative of

performance when it really isn’t

Computer Architecture

Page 53: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Instruction Performance

Therefore, the number of clock cycles required for a program can be

written as

CPU clock cycles = Instructions for a program × Average clock cycles

per instruction

The term clock cycles per instruction, which is the average number of

clock cycles each instruction takes to execute, is often abbreviated as CPI.

clock cycles per instruction (CPI)

Average number of clock cycles per instruction for a program

or program fragment.

Computer Architecture

Page 54: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Instruction Performance

Suppose we have two implementations of the same instruction set

architecture.

Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some

program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2

for the same program.

Which computer is faster for this program and by how much?

Computer Architecture

Page 55: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Instruction Performance

We know that each computer executes the same number of instructions for

the program; let’s call this number I. First, find the number of processor

clock cycles for each computer:

We can conclude

that computer A is

1.2 times as fast as

computer B for this

program.

Computer Architecture

Page 56: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Performance Equation II

We can now write performance equation ii in terms of instruction count,

CPI, and clock cycle time:

CPU execution time = Instruction count x average CPI x cycle time

for a program for a program

or, since the clock rate is the inverse of clock cycle time:

CPU execution time = Instruction count for a program × CPI

for a program

Clock rate

Computer Architecture

Page 57: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

A compiler designer is trying to decide between two code sequences for a

particular computer. The hardware designers have supplied the following

facts:

For a particular high-level language statement, the compiler writer is

considering two code sequences that require the following instruction

counts:

Computer Architecture

Page 58: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

Which code sequence executes the most instructions?

Which will be faster?

What is the CPI for each sequence?

Computer Architecture

Page 59: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

ANSWER

Sequence 1 executes 2 + 1 + 2 = 5 instructions.

Sequence 2 executes 4 + 1 + 1 = 6 instructions.

Therefore, sequence 1 executes fewer instructions.

We can use the equation for CPU clock cycles based on instruction

count and CPI to find the

total number of clock cycles for each sequence:

Computer Architecture

Page 60: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

This yields

CPU clock cycles1 = (2 × 1) + (1 × 2) + (2 × 3)

= 2 + 2 + 6 = 10 cycles

CPU clock cycles2 = (4 × 1) + (1 × 2) + (1 × 3)

= 4 + 2 + 3 = 9 cycles

So code sequence 2 is faster, even though it executes one extra

instruction.

Since code sequence 2 takes fewer overall clock cycles but has more

instructions, it must have a lower CPI.

Computer Architecture

Page 61: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

The CPI values can be computed by

Computer Architecture

Page 62: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation

The following figure shows the basic measurements at different levels in the

computer and what is being measured in each case.

We can see how these factors are combined to yield execution time

measured in seconds per program:

Computer Architecture

Page 63: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

The Classic CPU Performance Equation The following table summarizes how these algorithm, the language, the compiler, the

architecture, and the actual hardware affect the factors in the CPU performance

equation.

Computer Architecture

Page 64: 02 Computer Evolution and Performancesites.alaqsa.edu.ps/ekhsharaf/wp-content/uploads/sites/...• 8080 • first general purpose microprocessor • 8 bit data path • Used in first

Computer Architecture

Finally

I wish you good luck