Cache Caching

Embed Size (px)

Citation preview

  • 8/12/2019 Cache Caching

    1/15

    ELECTRICAL AND ELECTRONIC ENGINEERING

    Cache and Caching

    Caching refers to an important optimization technique used to reduce Von NeumannBottleneck (time spent performing memory access that can limit overall performance) andimprove the performance of any hardware or software system that retrieves information. cache acts as an intermediary.

    !

  • 8/12/2019 Cache Caching

    2/15

    Characteristics of Cache

    "mall# active# transparent and automatic

    "mall $ost caches are !%& of the main memory size and hold equal percentage of data

    ctive 'as active mechanism that e amines each request and decides how to respondvaila le or not availa le. *f not availa le# to retrieve a copy of item from data

    store+ecides which item to keep in cache

    ,ransparent cache can e inserted without making changes to the request or data store.*nterface cache presents to requester the same interface as it does to datastorage and vice versa

    utomatic Cache mechanism does not retrieve instruction on how to act or which dataitems to store in the cache storage. *nstead it implements an algorithm thate amines the sequence of requests and uses the request to determine how tomanage the cache.

    *mportance

    -le i ility as in usage

    o 'ardware# software and com ination of the twoo "mall# medium and large data itemso eneric data itemso pplication type of datao ,e tual and non te tualo Variety of computerso "ystems designed to retrieve data(we ) or those that store (physical memories)

    Cache terminologies

    ,here are many terminologies depending on application

    $emory system Backing store

    Cache we pages Browser

    "erver origin server

    +ata ase lookups Client request for data ase servers (system that handlesrequests)

    'it /equest that can e satisfied without any need to access theunderlying data store

    $iss /equest that cannot e satisfied

    'igh locality of reference sequence containing repetitions of the same request

    number of request that are hits

    Hit Ratio Total number of requests=

    0

  • 8/12/2019 Cache Caching

    3/15

    ( )!h mCost rC r C = + where Ch and C m are costs of accessing cache and storerespectively

    $iss ratio !1hit ratio

    Replacement policy

    Need ,o increase the hit ratio2

    !. ,he policy should retain those items that will e referenced most frequently

    0. "hould e ine pensive

    3. 4/5 method preferred

    Multi le el cache

    $ore than one cache used along the path from requester to data store. ,he cost of accessingmew cache is lower than the cost of accessing the original cache

    ( )! ! 0 0 ! 0!h h mCost rC r C r r C = + +

    $ulti1level caches generally operate y checking the smallest Le el ! (4!) cache first6 if ithits# the processor proceeds at high speed. *f the smaller cache misses# the ne t larger cache(40) is checked# and so on# efore e ternal memory is checked.

    "reloading Caches

    +uring start1up the hit ratio is very low since it has to fetch items from the store. ,his can eimproved y preloading the cache.

    o 5sing anticipation of requests (repeated)

    o -requently used pages

    "re#fetch related data

    *f a processor accesses a yte of memory# the cache fetches 78 ytes. ,hus if the processor fetches the ne t yte# the value will come from the cache. $odern computer systems employmultiple caches. Caching is used with oth virtual and physical memory as well as secondary

    memory.

    ,ranslation 4ookaside Buffer (,4B) contains digital circuits that move values into a Contentddressa le memory (C $) at high speed.

    Cache can be viewed as the main memory while data store as the external storage.

    3

  • 8/12/2019 Cache Caching

    4/15

    Caches in multiprocessors

    $rite through and %rite &ac'

    $rite through

    ,his is the method of writing to memory where the cache keeps a copy and forwardsthe write operation to the underlying memory.

    $rite &ac' scheme

    Cache keeps data item locally and only writes the value to memory if necessary. ,his is the case if value reaches end of 4/5 list and must ereplaced. ,o determine whether value is to e written ack# a it termed dirty

    it is kept y cache.

    Cache Coherence

    9erformance can e optimized y using write ack scheme than write through scheme. ,he performance can also e optimized y giving each processor its own cache. 5nfortunately thetwo methods conflict (write ack and multi1cache) during /: + and ;/*,: operations for the same address.,o avoid conflicts# all devices that access memory must follow a cache coherence protocolthat coordinates the values. :ach processor must inform the other processor of its operationso that the addressing is not confused.

    "hysical memory cache

    Demand paging as a form of cache

    Cache ehaves like physical memory and data storage as e ternal memory

    9age replacement policy as cache replacement policy

    cache inserted etween processor and memory need to understand physical address. ;e canimagine cache receiving a read request# checking to see if the request can e answered fromcache and then if the request is not present# to pass the request to underlying memory.

  • 8/12/2019 Cache Caching

    5/15

    Instructions and Data caches

    (hould all memory references pass through a single cache) ,o understand the question#imagine instructions eing e ecuted and data eing accessed.

    *nstruction fetch tends to ehave with highly locality since in many cases the ne t instructionis found at an ad=acent memory address. *f loops are used# they are small routines that can fitinto a cache.

    +ata fetch may e at random and hence not necessarily ad=acent in the memory address. lsoany time memory is referenced6 the cache keeps a copy even though the value will not eneeded again.

    ,he overall performance of the cache is reduced. rchitects vary in choice from differentcaches and one large cache that can allow intermi ing.

    *irtual memory caching and cache flush

    ;hen the

  • 8/12/2019 Cache Caching

    6/15

    ny application swap# the

  • 8/12/2019 Cache Caching

    7/15

    TL,

    ,he register file in the C95 is accessi le y oth the integer and the floating point units# or each unit may have its own specialized registers. ,he out1of1order e ecution units areintelligent enough to know the original order of the instructions in the program and re1impose

    program order when the results are to e committed (?retired@) to their final destinationregisters

    A

  • 8/12/2019 Cache Caching

    8/15

    E+clusi e ersus inclusi e cache

    $ulti1level caches introduce new design decisions. -or instance# in some processors# all datain the 4! cache must also e somewhere in the 40 cache. ,hese caches are called strictlyinclusi e . 7 EB)#

    ,hen comes an enormous 4evel 3 cache memory ( $B) for managing communications etween cores. ,hat means that if a core tries to access a data item and it@s not present in the

    http://en.wikipedia.org/wiki/Pentium_IIhttp://en.wikipedia.org/wiki/Pentium_IIIhttp://en.wikipedia.org/wiki/Pentium_4http://en.wikipedia.org/wiki/Itanium_2http://en.wikipedia.org/wiki/Phenom_IIhttp://en.wikipedia.org/wiki/Pentium_IIIhttp://en.wikipedia.org/wiki/Pentium_4http://en.wikipedia.org/wiki/Itanium_2http://en.wikipedia.org/wiki/Phenom_IIhttp://en.wikipedia.org/wiki/Pentium_II
  • 8/12/2019 Cache Caching

    9/15

    4evel 3 cache# there@s no need to look in the other cores@ private caches the data item won@t e there either. Conversely# if the data are present# four its associated with each line of thecache memory (one it per core) show whether or not the data are potentially present(potentially# ut not with certainty) in the lower1level cache of another core# and which one.

    "ipelining in Microprocessors

    $odern microprocessors are structured and hence they contain many internal processingunits. :ach unit performs a particular task. *n real sense each of these processing units is

    actually a special purpose microprocessor. ,he processor can process several instructionssimultaneously at various stages of e ecution. ,his a ility is called pipelining. *ntel % 7was the first processor to make use of idle memory time y fetching the ne t instructionwhile e ecuting the current one. ,his process accelerates the overall e ecution time of a

    program.

    -igure F shows how an *ntel i8 7 e ecutes the instruction in a pipeline fashion. ;hen oneinstruction is fetched# the other is decoded which the third is eing e ecuted while the fourthis eing e ecuted ack. ll these activities take place within the same time duration# thusgiving an overall e ecution rate of one instruction per clock cycle. Considering theconventional approach that requires 8 clock cycles to fetch and e ecute and write ack for

    one instruction# the pipelining approach is much superior. *f the start and end times of the

    F

  • 8/12/2019 Cache Caching

    10/15

    operation are considered# the overall (average) rate of processing comes out to e nearly one(slightly greater) instruction per clock.

    G9Bus

    -:,C' +ecode!

    : ecute!

    -:,C'0

    +ecode0

    : ecute0

    -:,C'3

    +ecode3

    : ecute3

    B5"H *+4: B5"H B5"H *+4: B5"H B5"H *+4: B5"H Non19ipelined : ecution ( % >)

    Bus5nit

    -:,C'!

    -:,C'0

    -:,C'3

    -:,C'8

    ",

  • 8/12/2019 Cache Caching

    11/15

    Additional Notes8% $'I J 0> ns+/ $ chips J access time 7% J !%% ns"/ $ J access time !> J 0> ns"/ $ (:C4) J access time !0 ns B5, e pensivessume aircraft moving at >% kmKh+istance moved in !0 nsL!K!% of diameter of hair Cache J attempts adv of quick "/ $ with cheapness of +/ $s. to achieve the most

    effective memory system.

    C

  • 8/12/2019 Cache Caching

    12/15

  • 8/12/2019 Cache Caching

    13/15

    , :lement of cache directory+etermines whether a hit or miss

    Valid it implies valid cache line-lush reset valid its of the cache line;rite protect No overwrite

    ":,

    :very tag of corresponding cache line are elements of the set.;ay -or a given set address# the tag address of all ways are simultaneously

    compared with the tag part of the address given out of the C95 for a hitKmisscriterion.

    Capacity 8 way M set M cache line size L !7E $iss will check 4/5 for replacement.Algorithms+irect $apping Cache line in one positionssociative Cache line can e anywhere within the four ways.

  • 8/12/2019 Cache Caching

    14/15

    30 it microprocessor ,he 8 B is divided into 0 0%cache pages each 8EB through to0>7 sets. ,he page is further divided into !7 yte cache line.

    !0EB in a 0 way organization. cache line is taken as 78 ytesand sets are !F0

    4arge caches can e implemented with e ternal "/ $s. ,ags may e for "/ $s with shortaccess time (!>n") while other data in "/ $s with access time of 0%n" which can ee ternal.

    Replacement strategies

    ,he cache controller uses the 4/5 its assigned to a set of cache lines for marking the lastaddressed (most recently) way of the set.

    /eplacement policy

    ll lines validO No

    B%L%O No

    No B!L%O B0L%O No

    Hes Hes

    !8

    /eplace cache entry

    /eplaceway !

    /eplaceway %

    /eplaceway 0

    /eplaceway 3

    /eplace invalid line

  • 8/12/2019 Cache Caching

    15/15

    /andom replacement is also possi le. Comprehensive statistical analyses have shown thatthere is very little difference etween the efficiency of the 4/5 and random replacementalgorithm. /eplacement policy will solely rely on the cache designer.

    ccess and addressing

    *f last access was way % or !# controller sets 4/5 it B%. -or access to way %# it B! is set.ddressing way ! sets 4/5 it B!. ccess way 0 the 4/5 it 0 is set. ddressing way 3# it0 is cleared.

    !>