Comp Arch Ch1 Ch2 Ch3 Ch4

Embed Size (px)

Citation preview

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    1/161

    William Stallings

    Computer Organizationand Architecture8th Edition

    CHAPTER 1

    INTRODUCTION

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    2/161

    Architecture and Organization

    Architecture is those attributes visible tothe programmer Instruction set, number of bits used for data

    representation, I/O mechanisms, addressing

    techniques. e.g. Is there a multiply instruction?

    Organization is how features areimplemented Control signals, interfaces, memory technology.

    e.g. Is there a hardware multiply unit or is it done by

    repeated addition?

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    3/161

    Family Concept

    All Intel x86 family share the same basicarchitecture

    The IBM System/370 family share the samebasic architecture

    This gives code compatibility (at least

    backwards)

    Organization differs between different

    versions

    Architecture and Organization

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    4/161

    Computer Complex system: How can we

    design/describe it?

    Hierarchical system: A set of interrelated subsystems, each

    subsystem hierarchic in structure until somelowest level of elementary subsystems is

    reached

    At each level of the system, the designer

    is concerned with structureand function.

    Structure and Function

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    5/161

    Structure and Function

    Structure is the way in whichcomponents relate to each other

    Function is the operation of individualcomponents as part of the structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    6/161

    Function

    General computer

    functions:

    Data processing Data storage

    Data movement

    Control

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    7/161

    Operations

    Data movement

    Ex., keyboard to

    screen

    Functional View of the Computer

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    8/161

    Operations

    Storage

    Ex., Internet

    download to disk

    Playing an mp3 file

    stored in memory

    to earphones attached

    to the same PC.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    9/161

    Operations

    Processing from/

    to storage

    Any number-crunching

    application that takes

    data from memory and

    stores the result back in

    memory.

    ex., updating bank

    statement

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    10/161

    Operations

    Processing from

    storage to I/O

    Receiving packets over a

    network interface,

    verifying their CRC,

    then storing them

    in memory.

    ex., printing a bank

    statement

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    11/161

    Structure

    Four main structural components

    CPU

    Main Memory

    I/O Devices

    System Interconnection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    12/161

    Structure

    Four main structural components

    1. Central Processing Unit (CPU)

    Controls the operation of thecomputer and performs its dataprocessing functions; often simplyreferred to as processor.

    2. Main Memory

    Stores data

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    13/161

    Structure

    Four main structural components3.I/O

    moves data between the computer

    and its external environment.4. System Interconnection

    Some mechanism that provides for

    communication among CPU, mainmemory, and I/O. A common example ofsystem interconnection is a system bus

    consisting of a number of wires to w/c all

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    14/161

    Structure Top Level

    ComputerMain

    Memory

    InputOutput

    SystemsInterconnection

    Peripherals

    Communicationlines

    CentralProcessing

    Unit

    Computer

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    15/161

    Structure The CPU

    ComputerArithmetic

    andLogic Unit

    Control

    Unit

    Internal CPUInterconnection

    Registers

    CPU

    I/O

    Memory

    SystemBus

    CPU

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    16/161

    Structure The Control Unit

    CPU

    ControlMemory

    Control UnitRegisters and

    Decoders

    SequencingLogic

    ControlUnit

    ALU

    Registers

    InternalBus

    Control Unit

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    17/161

    Computer Evolution andPerformance

    CHAPTER 2

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    18/161

    Brief History of Computers

    The First Generation: Vacuum Tubes

    ENIAC

    oElectronic Numerical Integrator And Computer

    oWorlds first general purpose electronic digitalcomputer

    o John Mauchly and John Eckert

    o It weighs 30 tons, occupying 1500 square feetof floor space, and containing more than18,000 vacuum tubes.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    19/161

    Brief History of Computers

    The First Generation: Vacuum Tubes

    Von Neumann/Turing

    o

    Stored Program concepto Main memory storing programs and data

    o Attributed to John von Neumann who was anENIAC designer and Alan Turing was the one

    who developed the idea

    o Input and output equipment operated bycontrol unit

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    20/161

    o In 1946, von Neumann and his colleaguesbegan the design of a new stored program

    computer, referred to as the IAS computer.

    o The IAS computer, although not completeduntil 1952, is the prototype of all subsequentgeneral-purpose computers.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    21/161

    Brief History of Computers

    IAS computer consist of:oA main memory, which stores both data and

    instructions

    o

    An arithmetic and logic unit (ALU) capable ofoperating on binary data

    oA control unit, which interprets the instructionsin memory and causes them to be executed

    o Input and output (I/O) equipment operated bythe control unit

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    22/161

    Structure of the IAS computer

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    23/161

    John von Neumann and the IAS machine, 1952

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    24/161

    UNIVAC

    o UNIVAC I (Universal Automatic Computer)

    o 1947 -Eckert-Mauchly Computer Corporationo first successful commercial computer. It was

    intended for both scientific and commercialapplications.

    o US Bureau of Census 1950 calculations

    o Became part of Sperry-Rand Corporation

    o Late 1950s -UNIVAC II

    -Faster

    -More memory

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    25/161

    IBM

    o Punched-card processing equipment

    o 1953 -the 701

    o IBMs first stored program computer

    o Scientific calculations

    o 1955 -the 702

    o Business applications

    o Lead to 700/7000 series

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    26/161

    Brief History of Computers

    The Second Generation: Transistors

    Transistoro is smaller, cheaper, and dissipates less heat

    than a vacuum tube but can be used in

    the same way as a vacuum tube to constructcomputers

    o invented at Bell Labs in 1947 by WilliamShockley

    o IBM 7000

    o DEC (Digital Equipment Corporation) wasfounded in 957

    o Produced PDP-1 in the same year

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    27/161

    Brief History of Computers

    The Third Generation: Integrated Circuits

    o A computer is made up of gates, memorycells and interconnections

    o single, self-contained transistor is called a

    discrete component

    o All these can be manufactured either

    separately (discrete components) or on the

    same piece of semiconductor

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    28/161

    Brief History of Computers

    Generations of Computers

    oVacuum tube -1946-1957

    oTransistor -1958-1964

    oSmall scale integration -1965 on

    -Up to 100 devices on a chip

    oMedium scale integration -to 1971

    -100-3,000 devices on a chip

    oLarge scale integration -1971-1977

    -3,000 -100,000 devices on a chipoVery large scale integration -1978 -1991

    -100,000 -100,000,000 devices on a chip

    oUltra large scale integration1991 -

    -Over 100,000,000 devices on a chip

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    29/161

    Moores Law

    Increased density of components on chip

    Gordon Mooreco-founder of IntelNumber of transistors on a chip will double everyyear

    Since 1970s development has slowed a little

    Number of transistors doubles every 18 months

    Cost of a chip has remained almost unchanged

    Higher packing density means shorter electrical

    paths, giving higher performanceSmaller size gives increased flexibility

    Reduced power and cooling requirements

    Fewer interconnections increases reliability

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    30/161

    Growth in CPU Transistor Count

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    31/161

    IBM 360 Series

    first planned family of computers.

    Similar or identical O/S

    Increasing speed

    Increasing number of I/O ports (i.e. moreterminals)

    Increased memory size

    Increased cost

    Multiplexed switch structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    32/161

    DEC PDP - 8 1964

    First minicomputer (after miniskirt!)

    Did not need air conditioned room

    Small enough to sit on a lab bench

    $16,000

    -$100k+ for IBM 360

    Embedded applications & OEM

    BUS STRUCTURE

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    33/161

    DEC-PDP 8 Bus Structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    34/161

    Semiconductor Memory

    1970Fairchild

    Size of a single core

    -i.e. 1 bit of magnetic core storageHolds 256 bits

    Non-destructive read

    Much faster than coreCapacity approximately doubles each year

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    35/161

    Microprocessors -Intel

    1971 -4004

    First microprocessorAll CPU components on a single chip

    4 bit

    Multiplication by repeated addition, no hardwaremultiplier!

    Followed in 1972 by 8008

    8 bit

    Both designed for specific applications

    1974 -8080

    Intels first general purpose microprocessor

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    36/161

    1970s Processors

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    37/161

    1980s Processors

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    38/161

    1990s Processors

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    39/161

    Recent Processors

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    40/161

    Designing for Performance

    Year by year, the cost of computer systemscontinues to drop dramatically, while theperformance and capacity of those systemscontinue to rise equally dramatically

    The basic building blocks for todays computermiracles are virtually the same as those of theIAS computer from over 50 years ago, while onthe other hand, the techniques for squeezing thelast iota of performance out of the materials athand have become increasingly sophisticated.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    41/161

    Designing for Performance

    But many techniques have been invented toimprove the performance.

    Some of the main techniques are the following:

    Pipelining

    On board cache

    On board L1 and L2 Cache

    Branch Prediction -The processor looks ahead in theinstruction code fetched from memory and predicts

    which branches, or groups of instructions, are likely tobe processed next. If the processor guesses right mostof the time, it can pre-fetch the correct instructions andbuffer them so that the processor is kept busy.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    42/161

    Designing for Performance

    Data Flow Analysis - The processor analyzes whichinstructions are dependent on each others results, ordata, to create an optimized schedule of instructions.

    Speculative Execution - Using branch prediction anddata flow analysis, some processors speculativelyexecute instructions ahead of their actual appearance inthe program execution, holding the results in temporarylocations. This enables the processor to keep its

    execution engines as busy as possible by executinginstructions that are likely to be needed.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    43/161

    Performance Balance

    While processor power has raced ahead atbreakneck speed, other critical components of thecomputer have not kept up. The result is a need to

    look for performance balance: an adjusting of theorganization and architecture to compensate forthe mismatch among the capabilities of the variouscomponents.

    Processor speed increased Memory capacity increased

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    44/161

    Logic and Memory Performance Gap

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    45/161

    While processor speed has grown rapidly, thespeed with which data can be transferredbetween main memory and the processor haslagged badly. The interface between processorand main memory is the most crucial pathway in

    the entire computer because it is responsible forcarrying a constant flow of program instructionsand data between memory chips and theprocessor. If memory or the pathway fails to

    keep pace with the processors insistentdemands, the processor stalls in a wait state,and valuable processing time is lost.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    46/161

    Solutions

    Increased number of bits retrieved at one time Make DRAM wider rather than deeper

    Change DRAM interface Cache

    Reduce frequency of memory access More complex cache and cache on chip

    Increase interconnection bandwidth High speed buses

    Hierarchy of buses

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    47/161

    I/O Devices

    As computers become faster and more capable,more sophisticated applications are developed thatsupport the use of peripherals with intensive I/Odemands.

    Solutions Caching

    Buffering

    Higher-speed interconnection buses

    More elaborate bus structures

    Multiple processor configurations

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    48/161

    Typical I/O Device Data Rates

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    49/161

    The key is balance among:

    Processor components Main memory

    I/O Devices

    Interconnection structures

    Th l ti f th I t l X86 A hit t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    50/161

    The evolution of the Intel X86 Architecture

    8080: The worlds first general-purpose

    microprocessor. This was an 8-bit machine,with an 8-bit data path to memory. The 8080 wasused in the first personal computer, the Altair.

    8086: A far more powerful, 16-bit machine. In

    addition to a wider data path and largerregisters, the 8086 sported an instruction cache,or queue, that pre-fetches a few instructionsbefore they are executed. A variant of thisprocessor, the 8088, was used in IBMs firstpersonal computer, securing the success of Intel.The 8086 is the first appearance of the x86

    architecture.

    Th l ti f th I t l X86 A hit t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    51/161

    The evolution of the Intel X86 Architecture

    80286: This extension of the 8086 enabled

    addressing a 16-MByte memory instead of just1 MByte.

    80386: Intels first 32-bit machine, and a major

    overhaul of the product. With a 32-bitarchitecture, the 80386 rivaled the complexity andpower of minicomputers and mainframesintroduced just a few years earlier. This was the

    first Intel processor to support multitasking,meaning it could run multiple programs at thesame time.

    Th l ti f th I t l X86 A hit t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    52/161

    The evolution of the Intel X86 Architecture

    80486: The 80486 introduced the use of

    much more sophisticated and powerfulcache technology and sophisticated instructionpipelining. The 80486 also offered a built-inmath coprocessor, offloading complex mathoperations from the main CPU.

    Pentium: With the Pentium, Intel introduced

    the use of superscalar techniques, whichallow multiple instructions to execute in parallel.

    Th l ti f th I t l X86 A hit t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    53/161

    The evolution of the Intel X86 Architecture

    Pentium Pro: The Pentium Pro continued the

    move into superscalar organization begunwith the Pentium, with aggressive use ofregister renaming, branch prediction, data flowanalysis, and speculative execution.

    Pentium II: The Pentium II incorporated IntelMMX technology, which is designedspecifically to process video, audio, and

    graphics data efficiently. Pentium III: The Pentium III incorporates

    additional floating-point instructions tosupport 3D graphics software.

    Th l ti f th I t l X86 A hit t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    54/161

    The evolution of the Intel X86 Architecture

    Pentium 4: The Pentium 4 includes

    additional floating-point and otherenhancements for multimedia.8

    Core: This is the first Intel x86

    microprocessor with a dual core, referring tothe implementation of two processors on asingle chip.

    Core 2: The Core 2 extends the architecture

    to 64 bits. The Core 2 Quad provides fourprocessors on a single chip.

    Embedded Systems and ARM

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    55/161

    Embedded Systems and ARM The ARM architecture refers to a processor

    architecture that has evolved from RISC designprinciples and is used in embedded systems.

    The term embedded system refers to the use ofelectronics and software within aproduct, as

    opposed to a general-purpose computer, such asa laptop or desktop system.

    Embedded system. A combination ofcomputer hardware and software, and

    perhaps additional mechanical or other parts,designed to perform a dedicated function. Inmany cases, embedded systems are part of alarger system or product, as in the case of an

    Embedded Systems and ARM

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    56/161

    Embedded Systems and ARM

    Embedded Systems Requirements:

    Small to large systems, implying very differentcost constraints, thus different needs foroptimization and reuse

    Relaxed to very strict requirements andcombinations of different quality requirements,for example, with respect to safety, reliability,real-time, flexibility, and legislation

    Short to long life times Different environmental conditions in terms of,

    for example, radiation, vibrations, and humidity

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    57/161

    Possible Organization of an Embedded System

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    58/161

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    59/161

    ARM Evolution

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    60/161

    ARM processors are designed to meet the

    needs of three system categories: Embedded real-time systems: Systems for storage,automotive body and power-train, industrial, andnetworking applications

    Application platforms: Devices running openoperating systems including Linux, Palm OS,Symbian OS, and Windows CE in wireless,consumer entertainment and digital imaging

    applications Secure applications: Smart cards, SIM cards, and

    payment terminals

    P f A

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    61/161

    Performance Assessment

    In evaluating processor hardware and setting

    requirements for new systems, performance is one of thekey parameters to consider, along with cost, size,security, reliability, and in some cases powerconsumption.

    System clock speed Operations performed by a processor, such as

    fetching an instruction, decoding the instruction,performing an arithmetic operation, and so on are

    governed by a system clock. The speed of a processor is dictated by the pulse

    frequency produced by the clock, measured incycles per second, or Hertz (Hz).

    P f A

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    62/161

    Performance Assessment

    Clock signals are generated by a quartz crystal,which generates a constant signal wave whilepower is applied. This wave is converted into adigital voltage pulse stream that is provided in a

    constant flow to the processor circuitry. The rate of pulses is known as the clock rate,

    or clock speed. One increment, or pulse, ofthe clock is referred to as a clock cycle, or a

    clock tick. The time between pulses is thecycle time.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    63/161

    System Clock

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    64/161

    Instruction execution takes place in discrete

    steps Fetch, decode, load and store, arithmetic or logical

    Usually require multiple clock cycles per instruction

    Pipelining simultaneous execution of instructions

    Conclusion: clock speed is not the whole story

    about performance

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    65/161

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    66/161

    Instruction execution rate

    Let CPIi be the number of cycles required for

    instruction type i. and Ii be the number of executedinstructions of type I be the number of cycles requiredfor instruction type i. and Ii be the number of executedinstructions of type i for a given program. Then we

    can calculate an overall CPI as follows:

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    67/161

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    68/161

    Instruction execution rate

    Millions of instructions per second (MIPS)

    Millions of floating point instructions per second(MFLOPS)

    Heavily dependent on:

    instruction set

    compiler design

    processor implementation

    cache & memory hierarchy

    We can express the MIPS rate in terms of the clock rateand CPI as follows:

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    69/161

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    70/161

    The average CPI when the program is executedon a uniprocessor with the above trace results is

    CPI 0.6 + (2*0.18) + (4*0.12) + (8*0.1) = 2.24.The corresponding MIPS rate is

    (400*106)/(2.24*106) = 178.

    Floating point performance is expressed asmillions of floating-point operations per second(MFLOPS), defined as follows:

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    71/161

    Benchmarks

    Programs designed to test performance

    benchmark suite is a collection of programs,defined in a high-level language, that togetherattempt to provide a representative test of acomputer in a particular application or systemprogramming area.

    System Performance Evaluation Corporation(SPEC), maintained and defined the best known

    collection of benchmark suites

    Averaging Results

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    72/161

    Averaging Results

    To obtain a reliable comparison of the performance ofvarious computers, it is preferable to run a number of

    different benchmark programs on each machine andthen average the results. For example, if m differentbenchmark program, then a simple arithmetic meancan be calculated as follows:

    Where Ri is the high-level language instructionexecution rate for the ith benchmark program.

    Alternative: Harmonic Mean

    Ahmdals Law

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    73/161

    Ahmdals Law

    Gene Amdahl

    Potential speed-up of program using multipleprocessors

    Concluded that:

    Code needs to be parallelizable

    Speed up is bound, giving diminishing returns formore processors

    Task dependent

    Servers gain by maintaining multipleconnections on multiple processors

    Databases can be split into parallel tasks

    Let T be the total execution time of the program using a

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    74/161

    Let T be the total execution time of the program using asingle processor. Then the speedup using a parallelprocessor with N processors that fully exploits the

    parallel portion of the program is as follows:

    Two important conclusions can be drawn:

    1. When f is small, the use of parallel processors haslittle effect.

    2. As N approaches infinity, speedup is bound by 1/(1 f),so that there are diminishing returns for using moreprocessors.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    75/161

    Speedup

    Suppose that a feature of the system is used duringexecution a fraction of the time f, before enhancement, and

    that the speedup of that feature after enhancement is SUf.Then the overall speedup of the system is

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    76/161

    For example, suppose that a task makes extensive use offloating-point operations, with 40% of the time is consumed

    by floating-point operations. With a new hardware design,the floating-point module is speeded up by a factor of K.Then the overall speedup is:

    Thus, independent of K, the maximum speedup is1.67

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    77/161

    Top Level View of ComputerFunction and Interconnection

    CHAPTER 3

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    78/161

    Computer Components

    The Control Unit and the Arithmetic and LogicUnit constitute the Central Processing Unit

    An instruction interpreter and a module ofgeneral-purpose arithmetic and logic functions

    Data and instructions must be put into thesystem

    Taken together, theses are referred to as I/O

    components Memory/Main Memory

    place to store temporarily both instructionsand data.

    Top-Level View Components

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    79/161

    Top-Level View Components

    Top Level View

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    80/161

    Top-Level View The CPU exchanges data with memory. For this

    purpose, it typically makes use of two internal (to theCPU) registers: a memory address register (MAR), whichspecifies the address in memory for the next read orwrite, and a memory buffer register (MBR), which

    contains the data to be written into memory or receivesthe data read from memory. Similarly, an I/O addressregister (I/OAR) specifies a particular I/O device. An I/Obuffer (I/OBR) register is used for the exchange of data

    between an I/O module and the CPU An I/O module transfers data from external devices toCPU and memory, and vice versa. It contains internalbuffers for temporarily holding these data until they can

    be sent on.

    C t F ti

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    81/161

    Computer Function

    The basic function performed by a computer isexecution of a program

    The processor does the actual work byexecuting instructions specified in the program.

    Instruction processing consists of two steps:The processor reads ( fetches) instructions from memoryone ata time and executes each instruction.

    Program execution (executes) consists of repeating the

    process of instruction fetch and instruction execution

    I t ti F t h d E t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    82/161

    Instruction Fetch and Execute

    Fetch Cycle Program Counter (PC) holds address of nextinstruction to fetch

    Processor fetches instruction from memory location

    pointed to by PC Increment PC

    Unless told otherwise

    Instruction loaded into Instruction

    Register (IR) Processor interprets instruction and performs required

    actions

    C t F ti

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    83/161

    Computer Function

    Instruction Cycle

    processing required for a single instruction

    The two steps are referred to as the fetch cycleand the execute cycle. Program execution halts

    only if the machine is turned off, some sort of

    unrecoverableerror occurs, or a programinstruction that halts the computer is encountered.

    I t ti F t h d E t

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    84/161

    Instruction Fetch and Execute

    Execute Cycle Processor-memorydata transfer between CPU and main memory

    Processor I/O

    Data transfer between CPU and I/O module Data processing

    Some arithmetic or logical operation on data

    Control

    Alteration of sequence of operations e.g. jump

    Combination of above

    E l f P g E ti

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    85/161

    Example of a Program Execution

    Instruction Fetch and Execute

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    86/161

    Instruction Fetch and Execute

    In this example, three instruction cycles, each

    consisting of a fetch cycle and an execute cycle, areneeded to add the contents of location 940 to thecontents of 941.

    With a more complex set of instructions, fewercycles would be needed. Some older processors, forexample, included instructions that contain morethan one memory address. Thus the execution cyclefor a particular instruction on such processor could

    involve more than one reference to memory. Also,instead of memory references, an instruction may

    specify an I/O operation.

    Instruction Cycle State Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    87/161

    Instruction Cycle State Diagram

    Instruction Cycle State Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    88/161

    Instruction Cycle State Diagram

    States in the upper part of the diagram involvean exchange between the processor and eithermemory or an I/O module. States in the lowerpart of the diagram involve only internal

    processor operations. The OAC state appearstwice, because an instruction may involve aread, a write, or both. However, the actionperformed during that state is fundamentally the

    same in both cases, and so only a single stateidentifier is needed.

    Instruction Cycle State

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    89/161

    Instruction Cycle State

    The states can be described as follows: Instruction address calculation (IAC): Determinethe address of the next instruction to be executed.

    Instruction fetch (IF): Read instruction from its

    memory location into the processor. Instruction operation decoding (IOD): Analyze

    instruction to determine type of operation to beperformed and operand(s) to be used.

    Operand address calculation (OAC): If theoperation involves reference to an operand inmemory or available via I/O, then determine theaddress of the operand.

    Instruction Cycle State

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    90/161

    Instruction Cycle State

    The states can be described as follows:

    Operand fetch (OF): Fetch the operand frommemory or read it in from I/O.

    Data operation (DO): Perform the operationindicated in the instruction.

    Operand store (OS): Write the result into memoryor out to I/O.

    Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    91/161

    Interrupts

    Mechanism by which other modules (e.g. I/O)may interrupt normal sequence of processing

    Program

    e.g. overflow, division by zero

    Timer Generated by internal processor time

    I/O

    from I/O controller Hardware failure

    e.g. memory parity error

    Program Flow Control

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    92/161

    Program Flow Control

    Interrupt Cycle

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    93/161

    Interrupt Cycle

    Added to instruction cycle

    Processor checks for interrupt

    Indicated by an interrupt signal

    If no interrupt, fetch next instruction

    If interrupt pending: Suspend execution of current program

    Save context

    Set PC to start address of interrupt handles routine Process interrupt

    Restore context and continue interrupted program

    Transfer of Control via Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    94/161

    Transfer of Control via Interrupts

    Transfer of Control via Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    95/161

    Transfer of Control via Interrupts

    From the point of view of the user program, aninterrupt is just that: an interruption of the normalsequence of execution. When the interruptprocessing is completed, execution resumes

    Thus, the user program does not have to containany special code to accommodate interrupts; theprocessor and the operating system areresponsible for suspending the user program

    and then resuming it at the same point.

    Interrupt Cycle

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    96/161

    Interrupt Cycle

    Added to instruction cycle

    Processor checks for interrupt

    Indicated by an interrupt signal

    If no interrupt, fetch next instruction

    If interrupt pending: Suspend execution of current program

    Save context

    Set PC to start address of interrupt handles routine Process interrupt

    Restore context and continue interrupted program

    Instruction Cycle with Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    97/161

    Instruction Cycle with Interrupts

    Instruction Cycle with Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    98/161

    Instruction Cycle with Interrupts

    The processor now proceeds to the fetch cycle and

    fetches the first instruction in the interrupt handlerprogram, which will service the interrupt. Theinterrupt handler program is generally part of theoperating system. Typically, this program

    determines the nature of the interrupt and performswhatever actions are needed. In the example wehave been using, the handler determines which I/Omodule generated the interrupt and may branch to a

    program that will write more data out to that I/Omodule. When the interrupt handler routine iscompleted, the processor can resume execution ofthe user program at the point of interruption.

    Instruction Cycle with Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    99/161

    Instruction Cycle with Interrupts

    In the interrupt cycle, the processor checks to see if

    any interrupts have occurred, indicated by thepresence of an interrupt signal. If no interrupts arepending, the processor proceeds to the fetch cycleand fetches the next instruction of the current

    program. If an interrupt is pending, the processordoes the following:

    It suspends execution of the current program beingexecuted and saves its context. This means saving

    the address of the next instruction to be executed(current contents of the program counter) and anyother data relevant to the processors current activity

    It sets the program counter to the starting address ofan interrupt handler routine.

    Program Timing Short I/O Wait

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    100/161

    Program Timing Short I/O Wait

    Program Timing Long I/O Wait

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    101/161

    Program Timing Long I/O Wait

    Instruction Cycle State Diagram w/

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    102/161

    y gInterrupts

    Multiple Interrupts

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    103/161

    Multiple Interrupts Disable Interrupts

    Processor will ignore further interrupts whilstprocessing one interrupt

    Interrupts remain pending and are checked after firstinterrupt has been processed

    Interrupts handled in sequence as they occur Define Priorities

    Low priority interrupts can be interrupted by higherpriority interrupts

    When higher priority interrupt has beenprocessed, processor returns to previousinterrupt

    Multiple Interrupts - Nested

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    104/161

    Multiple Interrupts Nested

    Multiple Interrupts - Sequential

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    105/161

    Multiple Interrupts Sequential

    Interconnection Structures

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    106/161

    Interconnection Structures

    The collection of paths connecting the variousmodules is called the interconnection structure.

    The design of this structure will depend on theexchanges that must be made among modules.

    Interconnection Structures

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    107/161

    Interconnection Structures

    Types of exchanges that are needed by

    indicating the major forms of input and output foreach module type:

    Memory: Typically, a memory module will consistof N words of equal length. Each word is assigned

    a unique numerical address (0, 1, . . . ,N 1). A wordof data can be read from or written into the memory

    I/O module: From an internal (to the computersystem) point of view, I/O is functionally similar to

    memory. There are two operations, read and write.Further, an I/O module may control more than oneexternal device. We can refer to each of theinterfaces to an external device as a port and giveeach a uniqueaddress (e.g., 0, 1, . . . ,M 1).

    Interconnection Structures

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    108/161

    Interconnection Structures- Processor: The processor reads in instructions and

    data, writes out data after processing, and uses controlsignals to control the overall operation of the system. Italso receives interrupt signals.

    Computer Module

    Memory Connection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    109/161

    Memory Connection

    Receives and sends data

    Receives addresses (of locations)

    Receives control signals

    Read

    Write

    Timing

    Input / Output Connection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    110/161

    Input / Output Connection

    Similar to memory from computers viewpoint

    Output

    Receive data from computer

    Send data to peripheral

    Input

    Receive data from peripheral Send data from computer

    Input / Output Connection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    111/161

    Input / Output Connection

    Receive control signals from computer

    Send control signals to peripherals

    Ex. Spin disk

    Receive addresses from computer

    Send interrupt signals (control)

    CPU Connection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    112/161

    CPU Connection

    Reads instruction and data

    Writes out data (after processing)

    Sends control signals to other units

    Receives (& acts on) interrupts

    Bus Interconnection

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    113/161

    us te co ect o

    A bus is a communication pathway connectingtwo or more devices

    Multiple devices connect to the bus, and a signaltransmitted by any one device is available for

    reception by all other devices attached to thebus.

    A bus that connects major computercomponents (processor, memory, I/O) is called asystem bus.

    Bus Structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    114/161

    On any bus the lines can be classified into threefunctional groups:

    The data lines provide a path for moving dataamong system modules. These lines, collectively,

    are called thedata bus

    . The address lines are used to designate the

    source or destination of the data on the data bus.

    The control lines are used to control the access to

    and the use of the data and address lines.

    Bus Structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    115/161

    The operation of the bus is as follows. If one modulewishes to send data to another, it must do twothings: (1) obtain the use of the bus, and (2) transferdata via the bus. If one module wishes to requestdata from another module, it must (1) obtain the useof the bus, and (2) transfer a request to the othermodule over the appropriate control and addresslines. It must then wait for that second module tosend the data.

    Bus Structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    116/161

    Typical Physical Realization of a Bus Architecture

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    117/161

    Traditional ISA with Cache

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    118/161

    High Performance Bus

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    119/161

    Bus Types

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    120/161

    yp

    Dedicated Separate data & address lines

    Multiplexed

    Shared lines

    Address valid or data valid control line

    Advantage - fewer lines

    Disadvantages More complex control

    Ultimate performance

    Bus Arbitration

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    121/161

    More than one module controlling the bus Ex. CPU and DMA controller

    Only one module may control bus at one time

    Arbitration may be centralised or distributed

    Centralized and Distributed Arbitration

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    122/161

    Centralised Single hardware device controlling bus access Bus Controller

    Arbiter

    May be part or separate Distributed

    Each module may claim the bus

    Control logic on all modules

    Timing

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    123/161

    g

    Co-ordination of events on bus Synchronous

    Events determined by clock signals

    Control Bus includes clock line

    A single 1-0 is a bus cycle

    All devices can read clock line

    Usually sync on leading edge

    Usually a single cycle for an event

    Synchronous Timing Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    124/161

    y g g

    Asynchronous Timing Read Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    125/161

    y g g

    Asynchronous Timing Write Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    126/161

    y g g

    PCI Bus

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    127/161

    Peripheral Component Interconnection

    Intel released to public domain

    32 or 64 bit

    PCI Bus Lines (required)

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    128/161

    System Lines Including clock and reset

    Address and Data

    32 time mux lines for address/data

    Interrupt & validate lines

    Interface Control

    Arbitration

    Not shared Direct connection to PCI bus arbiter

    Error Lines

    PCI Bus Lines (optional)

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    129/161

    Interrupt Lines Not shared

    Cache Support

    64-bit Bus Extension

    Additional 32 lines

    Time multiplexed

    2 lines to enable devices to agree to use 64-

    bit transfer JTAG/Boundary Scan

    For testing procedures

    PCI Commands

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    130/161

    Transaction between initiator (master) and target

    Master claims bus

    Determine type of transaction

    Ex. I/O read/write

    Address phase

    One or more data phases

    PCI Read Timing Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    131/161

    PCI Bus Arbiter

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    132/161

    PCI Bus Arbitration

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    133/161

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    134/161

    Cache Memory

    Chapter 4

    Terminology

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    135/161

    Capacity: the amount of information that can becontained in a memory unit usually in terms of words or bytes

    Word:the natural unit of organization in the memory,typically the number of bits used to represent a number

    Addressable unit: the fundamental data element sizethat can be addressed in the memory typically either the word size or individual bytes

    Unit of transfer:The number of data elementstransferred at a time usually bits in main memory and blocks in secondary

    memory

    Transfer rate: Rate at which data is transferred to/fromthe memory device

    Terminology

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    136/161

    Access time:

    For RAM, the time to address the unit andperform the transfer

    For non-random access memory, the time toposition the R/W head over the desired location

    Memory cycle time: Access time plus any othertime required before a second access can bestarted

    Access technique: how are memory contents

    accessed

    Memory Hierarchy

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    137/161

    Major design objective of any memory system

    To provide adequate storage capacity at An acceptable level of performance At a reasonable cost

    Four interrelated ways to meet this goal

    Use a hierarchy of storage devices Develop automatic space allocation methods forefficient use of the memory

    Through the use of virtual memory techniques, freethe user from memory management tasks

    Design the memory and its related interconnectionstructure so that the processor can operate at or nearits maximum rate

    Memory Hierarchy Basis of the memory hierarchy

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    138/161

    Basis of the memory hierarchy Registers internal to the CPU for temporary data storage (small in number but very fast) External storage for data and programs (relatively large and fast) External permanent storage (much larger and much slower)

    Characteristics of the memory hierarchy Consists of distinct levels of memory components Each level characterized by its size, access time, and cost

    per bit Each increasing level in the hierarchy consists of modules

    of larger capacity, slower access time, and lower cost/bit Goal of the memory hierarchy

    Try to match the processor speed with the rate ofinformation transfer from the lowest element in the hierarchy

    Memory Hierarchy Diagram

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    139/161

    Hierarchy List

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    140/161

    Registers

    L1 Cache

    L2 Cache

    Main memory

    Disk cache

    Disk

    OpticalTape

    Cache Memory

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    141/161

    y

    Cache memory is a critical component of thememory hierarchy

    Compared to the size of main memory, cache isrelatively small

    Operates at or near the speed of the processor Very expensive compared to main memory

    Cache contains copies of sections of main memory

    Cache Memory

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    142/161

    y

    Small amount of fast memory Sits between normal main memory and

    CPU

    May be located on CPU chip or module

    Cache and Main Memory

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    143/161

    Cache/Main Memory Structure

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    144/161

    Cache Operation - Overview

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    145/161

    CPU requests contents of memory location

    Check cache for this data

    If present, get from cache (fast)

    If not present, read required block from main

    memory to cache Then deliver from cache to CPU

    Cache includes tags to identify which block of

    main memory is in each cache slot

    Locality of Reference

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    146/161

    The cache memory works because of

    locality of reference Memory references made by the processor,

    for both instructions and data, tend to clustertogether

    Instruction loops, subroutines Data arrays, tables

    Keep these clusters in high speed memory toreduce the average delay in accessing data

    Over time, the clusters being referenced willchange -- memory management must dealwith this

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    147/161

    Cache Design

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    148/161

    Addressing

    Size

    Mapping Function

    Replacement Algorithm

    Write Policy

    Block Size

    Number of Caches

    Cache Addressing

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    149/161

    Where does cache sit?

    Between processor and virtual memory management unit Between MMU and main memory

    Logical cache (virtual cache) stores data usingvirtual addresses

    Processor accesses cache directly, not thorough physicalcache

    Cache access faster, before MMU address translation

    Virtual addresses use same address space for differentapplications

    Must flush cache on each context switch

    Physical cache stores data using main memoryphysical addresses

    Mapping Function

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    150/161

    Because there are fewer cache lines than

    main memory blocks, an algorithm isneeded for mapping main memory blocksinto cache lines.

    The choice of the mapping function dictateshow the cache is organized.

    3 techniques:direct, associative, and setassociative.

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    151/161

    Direct Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    152/161

    Direct Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    153/161

    Set Associative Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    154/161

    Set-associative mapping is a compromise that

    exhibits the strengths of both the direct andassociative approaches while reducing theirdisadvantages.

    Set Associative Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    155/161

    Set Associative Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    156/161

    Fully Associative Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    157/161

    Associative mapping overcomes thedisadvantage of direct mapping bypermitting each main memory block to be

    loaded into any line of the cache

    Fully Associative Mapping

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    158/161

    Write Policy

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    159/161

    Must not overwrite a cache block unless

    main memory is up to date

    Multiple CPUs may have individual caches

    I/O may address main memory directly

    Write Through

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    160/161

    All writes go to main memory as well as

    cache Multiple CPUs can monitor main memory

    traffic to keep local (to CPU) cache up to

    date Lots of traffic

    Slows down writes

    Remember bogus write through caches!

    Write Back

  • 8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

    161/161

    Updates initially made in cache only

    Update bit for cache slot is set when updateoccurs

    If block is to be replaced, write to main

    memory only if update bit is set Other caches get out of sync

    I/O must access main memory through

    cache N B 15% of memory references are writes