Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Dr.Khaled Kh. Sharaf
Faculty Of Computers
And Information
Technology
Second Term
2019- 2020
Computer Architecture
Chapter 2:
Computer Evolution and Performance
Computer Architecture
Computer Architecture
LEARNING OBJECTIVES
1. A Brief History of Computers.
2. The Evolution of the Intel x86 Architecture
3. Embedded Systems and the ARM
4. Performance Assessment
1. A BRIEF HISTORY OF COMPUTERS
The First Generation: Vacuum Tubes
Electronic Numerical Integrator And Computer (ENIAC)
- Designed and constructed at the University of Pennsylvania, was
the world’s first general purpose
- Started 1943 and finished 1946
- Decimal (not binary) - 20 accumulators of 10 digits
- Programmed manually - 18,000 vacuum tubes
by switches
- 30 tons -15,000 square feet
- 140 kW power consumption - 5,000 additions per second
Computer Architecture
The First Generation: Vacuum Tubes
VON NEUMANN MACHINE
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from memory and executing
• Input and output equipment operated by control unit
• Princeton Institute for Advanced Studies
• IAS
• Completed 1952
Computer Architecture
Structure of von Neumann machine
Computer Architecture
Structure of the IAS Computer
IAS - details
• 1000 x 40 bit words
• Binary number
• 2 x 20 bit instructions
Set of registers (storage in CPU) 1
• Memory buffer register (MBR): Contains a word to be stored in
memory or sent to the I/O unit, or is used to receive a word from
memory or from the I/O unit.
• • Memory address register (MAR): Specifies the address in
memory of the word to be written from or read into the MBR.
• • Instruction register (IR): Contains the 8-bit opcode instruction
being executed.
Computer Architecture
IAS - details
Set of registers (storage in CPU) 2
• Instruction buffer register (IBR): Employed to hold temporarily the
right hand instruction from a word in memory.
• Program counter (PC): Contains the address of the next instruction
pair to be fetched from memory.
• Accumulator (AC) and multiplier quotient (MQ): Employed to hold
temporarily operands and results of ALU operations.
Computer Architecture
Commercial Computers
The 1950s saw the birth of the computer industry with two companies,
Sperry and IBM, dominating the marketplace
The UNIVAC I (Universal Automatic Computer)
was the first successful commercial computer. It was intended for both
scientific and commercial applications
• Late 1950s - UNIVAC II
• Faster
• More memory
Computer Architecture
IBM
• Punched-card processing equipment
• 1953 - the 701
• IBM’s first stored program computer
• Scientific calculations
• 1955 - the 702
• Business applications
• Lead to 700/7000 series
Computer Architecture
The Second Generation: Transistors
• The second generation saw the introduction of more complex arithmetic
and logic units and control units, the
• Use of high-level programming languages, and the provision of system
software with the computer.
• system software provided the ability to
• load programs,
move data to peripherals, and
libraries to perform common computations, similar to what modern
OSes like Windows and Linux do.
• Literally - “small electronics”
• A computer is made up of gates, memory cells and interconnections
• These can be manufactured on a semiconductor
• e.g. silicon wafer
Computer Architecture
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Computer Architecture
Transistor Based Computers
• Second generation machines
• NCR & RCA produced small transistor machines
• IBM 7000
• DEC – 1957 “Digital Equipment Corporation”
• Produced PDP-1
Computer Architecture
Third Generation: Integrated Circuits
Microelectronics A single, self-contained transistor is called a discrete component.
Throughout the 1950s and early 1960s, electronic equipment was
composed largely of discrete components— transistors, resistors,
capacitors, and so on.
Microelectronics means, literally, “small electronics.” Since the
beginnings of digital electronics and the computer industry, there has been
a persistent and consistent trend toward the reduction in size of digital
electronic circuits.
Computer Architecture
Computer Architecture
Relationship among
Wafer, Chip, and Gate
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
• Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical paths, giving higher
performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Computer Architecture
consequences of Moore’s law
1. The cost of a chip has remained virtually unchanged during this
period of rapid growth in density. This means that the cost of
computer logic and memory circuitry has fallen at a dramatic rate.
2. Because logic and memory elements are placed closer together on
more densely packed chips, the electrical path length is shortened,
increasing operating speed.
3. The computer becomes smaller, making it more convenient to
place in a variety of environments.
4. There is a reduction in power and cooling requirements.
5. The interconnections on the integrated circuit are much more
reliable than solder connections. With more circuitry on each chip,
there are fewer interchip connections
Computer Architecture
Later Generations
• Beyond the third generation there is less general agreement on
defining generations of computers. Table 2.2 suggests that there have
been a number of later generations, based on advances in integrated
circuit technology.
• With the introduction of large-scale integration (LSI), more than 1000
components can be placed on a single integrated circuit chip.
• Very-large-scale integration (VLSI) achieved more than 10,000
components per chip, while current ultra-large-scale integration (ULSI)
chips can contain more than one billion components.
• SEMICONDUCTOR MEMORY The first application of integrated circuit
technology to computers was construction of the processor (the control
unit and the arithmetic and logic unit) out of integrated circuit chips. But
it was also found that this same technology could be used to construct
memories.
Computer Architecture
Later Generations
• SEMICONDUCTOR MEMORY The first application of integrated circuit
technology to computers was construction of the processor (the control
unit and the arithmetic and logic unit) out of integrated circuit chips. But
it was also found that this same technology could be used to construct
memories.
• Since 1970, semiconductor memory has been through 13 generations:
1K, 4K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of
this writing, 16 Gbits on a single chip (1K = 210, 1M = 220, 1G = 230).
Each generation has provided four times the storage density of the
previous generation, accompanied by declining cost per bit and
declining access time.
Computer Architecture
Later Generations
MICROPROCESSORS Just as the density of elements on memory chips
has continued to rise, so has the density of elements on processor chips.
As time went on, more and more elements were placed on each chip, so
that fewer and fewer chips were needed to construct a single computer
processor.
Computer Architecture
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965
• Up to 100 devices on a chip
• Medium scale integration - to 1971
• 100-3,000 devices on a chip
• Large scale integration - 1971-1977
• 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
• 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
• Over 100,000,000 devices on a chip
Computer Architecture
Generations of Computer
Computer Architecture
x86 Evolution (1) • 8080
• first general purpose microprocessor
• 8 bit data path
• Used in first personal computer – Altair
• 8086 – 5MHz – 29,000 transistors
• much more powerful
• 16 bit
• instruction cache, prefetch few instructions
• 8088 (8 bit external bus) used in first IBM PC
• 80286
• 16 Mbyte memory addressable
• up from 1Mb
• 80386
• 32 bit
• Support for multitasking
• 80486
• sophisticated powerful cache and instruction pipelining
• built in maths co-processor
Computer Architecture
x86 Evolution (2)
• Pentium
• Superscalar
• Multiple instructions executed in parallel
• Pentium Pro
• Increased superscalar organization
• Aggressive register renaming
• branch prediction
• data flow analysis
• speculative execution
• Pentium II
• MMX technology
• graphics, video & audio processing
• Pentium III
• Additional floating point instructions for 3D graphics
Computer Architecture
x86 Evolution (3)
• Pentium 4
• Note Arabic rather than Roman numerals
• Further floating point and multimedia enhancements
• Core
• First x86 with dual core
• Core 2
• 64 bit architecture
• Core 2 Quad – 3GHz – 820 million transistors
• Four processors on chip
• x86 architecture dominant outside embedded systems
• Organization and technology changed dramatically
• Instruction set architecture evolved with backwards compatibility
• ~1 instruction per month added
• 500 instructions available
• See Intel web pages for detailed information on processors
Computer Architecture
Embedded Systems ARM
• Embedded system. A combination of computer hardware and software,
and perhaps additional mechanical or other parts, designed to perform a
dedicated function. In many cases, embedded systems are part of a
larger system or product, as in the case of
• an antilock.
• braking system in a car.
• ARM evolved from RISC design
• Used mainly in embedded systems
• Used within product
• Not general purpose computer
• Dedicated function
Computer Architecture
Embedded Systems Requirements
• Different sizes
• Different constraints, optimization, reuse
• Different requirements
• Safety, reliability, real-time, flexibility, legislation
• Lifespan
• Environmental conditions
• Static v dynamic loads
• Slow to fast speeds
• Computation v I/O intensive
• Descrete event v continuous dynamics
Computer Architecture
Possible Organization of an Embedded System
Computer Architecture
ARM Evolution
• Designed by ARM Inc., Cambridge, England
• Licensed to manufacturers
• High speed, small die, low power consumption
• PDAs, hand held games, phones
• E.g. iPod, iPhone
• Acorn produced ARM1 & ARM2 in 1985 and ARM3 in 1989
• Acorn, VLSI and Apple Computer founded ARM Ltd.
Computer Architecture
ARM Systems Categories
• Embedded real time
• Application platform
• Linux, Palm OS, Symbian OS, Windows mobile
• Secure applications
Computer Architecture
What do we measure?
Define performance….
Computer Architecture
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598
Boeing 747 470 4150 610
BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50 146 8720 544
Define performance….
• How much faster is the Concorde compared to the 747?
• How much bigger is the Boeing 747 than the Douglas DC-8?
• So which of these airplanes has the best performance?!
When trying to choose among different computers, performance is an
important attribute. Accurately measuring and comparing different
computers is critical to purchasers and therefore to designers.
Computer Architecture
Defining Performance
we can define computer performance in several different ways.
• If you were running a program on two different desktop computers,
you’d say that the faster one is the desktop computer that gets the
job done first.
• If you were running a datacenter that had several servers running
jobs submitted by many users, you’d say that the faster computer
was the one that completed the most jobs during a day.
Computer Architecture
Defining Performance: TIME, TIME, TIME!!!
• Response Time (elapsed time, latency,):
• how long does it take for my job to run?
• how long does it take to execute (start to
finish) my job?
• how long must I wait for the database query?
• Throughput:
• how many jobs can the machine run at once?
• what is the average execution rate?
• how much work is getting done?
• If we upgrade a machine with a new processor what
do we increase?
• If we add a new machine to the lab what do we increase?
Individual user concerns…
Systems manager
concerns…
Computer Architecture
Defining Performance
If we upgrade a machine with a new processor what do we increase?
- both response time and throughput are improved.
If we add a new machine to the lab what do we increase?
- case 2, no one task gets work done faster, so only throughput
increases.
Thus, in many real computer systems, changing either execution time or
throughput often affects the other.
Computer Architecture
Execution Time
• Elapsed Time
• counts everything (disk and memory accesses, waiting for I/O, running other
programs, etc.) from start to finish
• a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
• doesn't count waiting for I/O or time spent running other programs
• can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
• Our focus: user CPU time (CPU execution time or, simply, execution time)
• time spent executing the lines of code that are in our program
Computer Architecture
Execution Time
• For some program running on machine X:
PerformanceX = 1 / Execution timeX
• X is n times faster than Y means:
PerformanceX / PerformanceY = n
• execution time on Y is n times longer than it is on X:
Execution timey / Execution timeX = n
Computer Architecture
Execution Time
Relative Performance
If computer A runs a program in 10 seconds and computer B runs the
same program in 15 seconds, how much faster is A than B?
We know that A is n times faster than B if
Thus the performance ratio is 15 / 10 = 1.5
A is therefore 1.5 times faster than B.
Execution timeB / Execution timeA = n
Computer Architecture
Clock Cycles
• Instead of reporting execution time in seconds, we often use cycles. In modern
computers hardware events progress cycle by cycle: in other words, each event,
e.g., multiplication, addition, etc., is a sequence of cycles
• Clock ticks indicate start and end of cycles:
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec, 1 MHz. = 106
cycles/sec)
• Example: A 200 Mhz. clock has a
cycle time
time
seconds
program
cycles
program
seconds
cycle
1
200 106 109 5 nanoseconds
cycle
tick
tick
clock period:
The length of
each clock cycle.
Computer Architecture
Measuring Performance
Time is the measure of computer performance: the computer that
performs the same amount of work in the least time is the fastest.
Program execution time is measured in seconds per program.
The time can be defined in different ways, depending on what we count
wall clock time, response time, or elapsed time: These terms mean
the total time to complete a task, including:
- disk accesses,
- memory accesses,
- input/output (I/O) activities,
- operating system overhead—everything.
Computer Architecture
Measuring Performance
In such cases, the system may try to optimize throughput rather than
attempt to minimize the elapsed time for one program.
CPU execution time or simply CPU time is the actual time the CPU
spends computing for a specific task and does not include time spent
waiting for I/O or running other programs.
user CPU time is the CPU time spent in a program itself.
system CPU time is the CPU time spent in the operating system
performing tasks on behalf of the program.
Computer Architecture
Measuring Performance
Because it is often hard to assign responsibility for operating system
activities to one user program rather than another, And because of the
functionality differences among operating systems.
The differentiating between system and user CPU time is difficult to
do accurately.
For consistency, we maintain a distinction between performance based
on elapsed time and that based on CPU execution time.
We will use the term system performance to refer to elapsed time on
an unloaded system and CPU performance to refer to user CPU time.
Computer Architecture
Understanding Program Performance
Different applications are sensitive to different aspects of the
performance of a computer system.
Many applications, especially those running on servers, depend as much
on I/O performance, which, in turn, relies on both hardware and
software.
Total elapsed time measured by a wall clock is the measurement of
interest.
In some application environments, the user may care about throughput,
response time, or a complex combination.
Computer Architecture
Measuring Performance
To improve the performance of a program, one must have a clear
definition of what performance metric.
Almost all computers are constructed using a clock that determines when
events take place in the hardware.
These discrete time intervals are called clock cycles.
clock cycle is the time for one clock period, usually of the processor
clock, which runs at a constant rate.
clock period: The length of each clock cycle.
clock rate: The speed at which the processor execute instruction.
Computer Architecture
Performance Equation I
• So, to improve performance one can either:
• reduce the number of cycles for a program, or
• reduce the clock cycle time, or, equivalently,
• increase the clock rate
seconds
program
cycles
program
seconds
cycle
CPU execution time CPU clock cycles Clock cycle time
for a program for a program =
equivalently
Computer Architecture
CPU Performance and Its Factors
Alternatively, because clock rate and clock cycle time are inverses,
CPU execution time for a program = CPU clock cycles for a program
Clock rate
Computer Architecture
How many cycles are required for a program?
• Could assume that # of cycles = # of instructions
time
1st
instr
uction
2n
d instr
uction
3rd
instr
uction
4th
5th
6th
...
This assumption is incorrect! Because:
Different instructions take different amounts of time (cycles)
Why…?
Computer Architecture
How many cycles are required for a program?
• Multiplication takes more time than addition
• Floating point operations take longer than integer ones
• Accessing memory takes more time than accessing registers
• Important point: changing the cycle time often changes the
number of cycles required for various instructions because it
means changing the hardware design. More later…
time
Computer Architecture
Example
• Our favorite program runs in 10 seconds on computer A, which has a
2Ghz. clock.
• We are trying to help a computer designer build a new machine B, that
will run this program in 6 seconds. The designer can use new (or
perhaps more expensive) technology to substantially increase the clock
rate, but has informed us that this increase will affect the rest of the CPU
design, causing machine B to require 1.2 times as many clock cycles as
machine A for the same program.
• What clock rate should we tell the designer to target?
Computer Architecture
CPU Performance and Its Factors
Computer Architecture
Terminology
• A given program will require:
some number of instructions (machine instructions)
some number of cycles
some number of seconds
• We have a vocabulary that relates these quantities:
• cycle time (seconds per cycle)
• clock rate (cycles per second)
• (average) CPI (cycles per instruction)
• a floating point intensive application might have a higher average CPI
• MIPS (millions of instructions per second)
• this would be higher for a program using simple instructions
Computer Architecture
Performance Measure
• Performance is determined by execution time
• Do any of these other variables equal performance?
• # of cycles to execute program?
• # of instructions in program?
• # of cycles per second?
• average # of cycles per instruction?
• average # of instructions per second?
• Common pitfall : thinking one of the variables is indicative of
performance when it really isn’t
Computer Architecture
Instruction Performance
Therefore, the number of clock cycles required for a program can be
written as
CPU clock cycles = Instructions for a program × Average clock cycles
per instruction
The term clock cycles per instruction, which is the average number of
clock cycles each instruction takes to execute, is often abbreviated as CPI.
clock cycles per instruction (CPI)
Average number of clock cycles per instruction for a program
or program fragment.
Computer Architecture
Instruction Performance
Suppose we have two implementations of the same instruction set
architecture.
Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some
program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2
for the same program.
Which computer is faster for this program and by how much?
Computer Architecture
Instruction Performance
We know that each computer executes the same number of instructions for
the program; let’s call this number I. First, find the number of processor
clock cycles for each computer:
We can conclude
that computer A is
1.2 times as fast as
computer B for this
program.
Computer Architecture
Performance Equation II
We can now write performance equation ii in terms of instruction count,
CPI, and clock cycle time:
CPU execution time = Instruction count x average CPI x cycle time
for a program for a program
or, since the clock rate is the inverse of clock cycle time:
CPU execution time = Instruction count for a program × CPI
for a program
Clock rate
Computer Architecture
The Classic CPU Performance Equation
A compiler designer is trying to decide between two code sequences for a
particular computer. The hardware designers have supplied the following
facts:
For a particular high-level language statement, the compiler writer is
considering two code sequences that require the following instruction
counts:
Computer Architecture
The Classic CPU Performance Equation
Which code sequence executes the most instructions?
Which will be faster?
What is the CPI for each sequence?
Computer Architecture
The Classic CPU Performance Equation
ANSWER
Sequence 1 executes 2 + 1 + 2 = 5 instructions.
Sequence 2 executes 4 + 1 + 1 = 6 instructions.
Therefore, sequence 1 executes fewer instructions.
We can use the equation for CPU clock cycles based on instruction
count and CPI to find the
total number of clock cycles for each sequence:
Computer Architecture
The Classic CPU Performance Equation
This yields
CPU clock cycles1 = (2 × 1) + (1 × 2) + (2 × 3)
= 2 + 2 + 6 = 10 cycles
CPU clock cycles2 = (4 × 1) + (1 × 2) + (1 × 3)
= 4 + 2 + 3 = 9 cycles
So code sequence 2 is faster, even though it executes one extra
instruction.
Since code sequence 2 takes fewer overall clock cycles but has more
instructions, it must have a lower CPI.
Computer Architecture
The Classic CPU Performance Equation
The CPI values can be computed by
Computer Architecture
The Classic CPU Performance Equation
The following figure shows the basic measurements at different levels in the
computer and what is being measured in each case.
We can see how these factors are combined to yield execution time
measured in seconds per program:
Computer Architecture
The Classic CPU Performance Equation The following table summarizes how these algorithm, the language, the compiler, the
architecture, and the actual hardware affect the factors in the CPU performance
equation.
Computer Architecture
Computer Architecture
Finally
I wish you good luck