Goodman SnoopyProtocol

Embed Size (px)

Citation preview

  • 8/10/2019 Goodman SnoopyProtocol

    1/8

    U S I N G C A C H E M E M O R Y T O R E D U C E P R O C E S S O R - M E M O R Y T R A F F I C

    J a m e s R . G o o d m a n

    D e p a r t m e n t o f C o m p u t e r S c i e n c e s

    U n i v e r s i t y o f W i s c o n s i n - M a d i s o n

    M a d i s o n , W I 5 8 7 0 6

    A B S T R A C T - T h e

    i m p o r t a n c e o f r e d u c i n g p r o c e s s o r -

    m e m o r y b a n d w i d t h i s r e c o g n i z e d i n t w o d i s t i n c t s i t u a -

    t i o n s : s i n g l e b o a r d c o m p u t e r s y s t e m s a n d m i c r o p r o c e s -

    s o r s o f t h e f u t u r e . C a c h e m e m o r y i s i n v e s t i g a t e d a s a

    w a y t o r e d u c e t h e m e m o r y - p r o c e s s o r t r a ff i c . W e s h o w

    t h a t t r a d i t i o n a l c a c h e s w h i c h d e p e n d h e a v i l y o n s p a t i a l

    l o c a l i t y ( l o o k - a h e a d ) f o r t h e i r p e r f o r m a n c e a r e i n a p -

    p r o p r i a t e i n t h e s e e n v i r o n m e n t s b e c a u s e t h e y g e n e r a t e

    l a r g e b u r s t s o f b u s t r a ff i c . A c a c h e e x p l o i t i n g p r i m a r i l y

    t e m p o r a l l o c a l i t y ( l o o k - b e h i n d ) is t h e n p r o p o s e d a n d

    d e m o n s t r a t e d t o b e e f f e c t i v e i n a n e n v i r o n m e n t w h e r e

    p r o c e s s s w i t c h e s a r e i n f r e q u e n t . W e a r g u e t h a t s u c h a n

    e n v i r o n m e n t i s p o s s i b l e i f t h e t r a f f i c t o b a c k i n g s t o r e i s

    s m a l l e n o u g h t h a t m a n y p r o c e s s o r s c a n s h a r e a c o m m o n

    m e m o r y a n d i f t h e c a c h e d a t a c o n s i s t e n c y p r o b l e m i s

    s o l ve d . W e d e m o n s t r a t e t h a t s u c h a c a c h e c a n i n d e e d

    r e d u c e t r a f f i c t o m e m o r y g r e a t l y , a n d i n t r o d u c e e .r

    e l e g a n t s o l u t i o n t o t h e c a c h e c o h e r e n c y p r o b l e m .

    1 .

    I n t r o d u c t i o n

    B e c a u s e t h e r e a r e s t ra i g h t fo r w a r d w a y s t o c o n -

    s t r u c t p o w e r f u l , c o s t - ef f e c t iv e s y s t e m s u s i n g r a n d o m

    a c c e s s m e m o r i e s a n d s in gl e- ch ip m i c r o p r o c e s s o r s , s e m -

    i c o n d u c t o r t e c h n o l o g y h a s , u n t i l n o w , h a d t h e g r e a t e s t

    i m p a c t t h r o u g h t h e se c o m p o n e n t s . H i g h - p e r f o r m a n c e

    processors , howe ver, are still e yo nd the capability of a

    s i ng l e- c hi p i m p l e m e n t a t i o n a n d a r e n o t e a s i l y p a rt i -

    tioned in a wa y wh ich can effectively exploit the t echnol-

    o g y a n d e c o n o m i e s o f V LS I. A n i n te re st in g p h e n o m e n o n

    h a s o c c u r r e d i n t h e p r e v i o u s d e c a d e a s a r e su l t o f t h is

    d i sp a ri t y. M e m o r y c o s t s h a v e d r o p p e d r a d i ca l ly a n d c o n -

    sistently f o r c o m p u t e r s y s t e m s o f a l l s i z e s . W h i le t h e

    c o m p o n e n t c o s t o f a C P U ( s i n g l e - c h i p im p l e m e n t a t i o n s

    e x c l u d e d ) h a s d e c l i n e d s i g n i f i c a n t ly o v e r t h e s a m e

    p e r i o d , t h e r e d u c t i o n h a s b e e n l e s s d r a m a t i c . A r e s u l t is

    t h a t t h e a m o u n t o f m e m o r y t h o u g h t t o h e a p p r o p r i a t e

    f o r a g i v e n s p e e d p r o c e s s o r h a s g r o w n d r a m a t i c a l l y i n

    r e c e n t y e a r s . T o d a y s m a l l m i n i c o m p u t e r s h a v e m e m o r y

    a s l a r g e a s t h a t o f t h e m o s t e x p e n s i v e m a c h i n e s o f a

    d e c a d e a g o .

    P e r m i s s i o n t o c o p y w i t h o u t

    f e e a l l

    o r p a r t o f t h i s m a te r i a l i s g r a n t e d

    p r o v i d e d t h a t t h e c o p i e s a r e n o t m a d e o r d i s t r i b u te d f o r d i r e c t

    c o m m e r c i a l a d v a n t a g e , t h e A C M c o p y r i g h t n o ti c e a n d t h e t i t l e o f th e

    p u b l i c a t i o n a n d i t s d a t e a p p e a r , a n d n o t i c e i s g iv e n t h a t c o p y in g i s b y

    p e r m i s s i o n o f t h e A s s o c i a t i o n fo r C o m p u t i n g M a c h i n e r y . T o c o p y

    o th e r wi se , o r t o r e p u b l i sh , r e q u i r e s a f e e a n d /o r sp e c i f ic p e r m i ss io n .

    T h e i m p a c t o f V L SI h a s b e e n v e r y d i f f e r e n t i n

    m i c r o p r o c e s s o r a p p l i c a t i o n s . H e r e m e m o r y i s s t i ll

    r e g a r d e d a s a n e x p e n s i v e c o m p o n e n t i n t h e s y s t e m , a n d

    t h o s e f a m i l i a r p r i m a r i l y w i t h a m i n i c o m p u t e r o r m a i n -

    f r a m e e n v i r o n m e n t a r e o f t e n s c o r n f u l o f t h e t r o u b l e t o

    w h i c h m i c r o p r o c e s s o r u s e r s g o t o c o n s e r v e m e m o r y .

    T h e re a s o n , o f c o u r s e , i s t h a t e v e n t h e s m a l l m e m o r y i n a

    m i c r o p r o c e s s o r i s a m u c h l a r g e r p o r t i o n o f t h e t o t a l s ys -

    t e m c o s t t h a n t h e m u c h l a r g e r m e m o r y o n a t y p i c a l

    m a i n f r a m e s y s t e m . T h i s r e s u l t s f r o m t h e f a c t t h a t

    m e m o r y a n d p r o c e s s o r s a r e i m p l e m e n t e d i n t h e s a m e

    t e c h n o l o g y .

    1 .1 . A S u p e r C P U

    W i t h t h e a d v a n c e s t o VL S I o c c u r r i n g n o w a n d c o n -

    t i n u i n g o v e r t h e n e x t f e w y e a r s , i t w i ll b e c o m e p o s s i b l e

    t o f a b r i c a t e c i r c u i t s t h a t a r e o n e t o t w o o r d e r s o f m a g n i -

    t u d e m o r e c o m p l e x t h a n c u r r e n t l y a v a i l a b l e micropro-

    c e s s o r s . I t w i ll s o o n b e p o s s i b l e t o f a b r i c a t e a n

    e x t r e m e l y h i g h - p e r f o r m a n c e C P U on a s i n g le c h i p , I f t h e

    e n t i r e c h i p i s d e v o t e d t o t h e C P U , h o w e v e r , i t i s n o t a

    g o o d i d e a . E x t r a p o l a t i n g h i s t o r i c a l t r e n d s t o p r e d i c t

    f u tu r e c o m p o n e n t d en si ti es , w e m i g h t e x p e c t t h a t w it h in

    a f e w y e a r s w e s h o u l d b e a b l e t o p u r c h a s e a s in g le - ch i p

    p r o c e s s o r c o n t a i n i n g at l e as t t e n t i m e s a s m a n y t r an s is -

    t o rs a s o c c u r in, s a y , t h e M C 6 8 0 0 0 . F o r t h e e m p i r i c a l

    r u le k n o w n a s G r o s c h ' s l a w [ G r o s c h 5 3 ] , P = k C g, w h e r e

    P is s o m e m e a s u r e o f p e r f o r m a n c e , C i s t h e c o st , a n d k

    a n d g a r e c o ns t a n t s , K n i g h t [ K n i g h t 6 6 ] c o n c l u d e d t h a t g

    is a t l ea st 2, a n d S o l o m o n [ S o l o m o n 6 6 ] h a s s u g g e s t e d

    t h a t g ~ 1 . 4 7 . F o r t h e I B M S y s t e m / B 7 0 f am il y, S i e w i o r e k

    d e t e r m i n e d t h a t g a l . 8 [ S ie w io r ek 8 2] . W h i l e G r o s c h ' s l a w

    b r e a k s d o w n i n t h e c o m p a r i s o n o f p r o c e s s o r s u s in g

    different te ch no lo gy or architectures, it is realistic for

    p r e d i c t in g i m p r o v e m e n t s w i t h i n a si n gl e t e c h n o l o g y .

    Siewi orek in fact sugge sts tha t it holds by definition.

    A s s u m i n g g = 1 .5 a n d u s in g p r o c e s s o r - m e m o r y

    b a n d w i d t h a s o u r m e a s u r e o f p e r f o r m a n c e , G r o s c h ' s l a w

    p r e d i c t s t h a t a p r o c e s s o r c o n t a i n i n g i 0 t i m e s a s m a n y

    t r an s is t or s a s a c u r r e n t m i c r o p r o c e s s o r w o u l d r e q u i r e

    3 0 t i me s t he m e m o r y b a n dw i d th . * T h e Mo t o r o l a M C 6 8 0 0 0 ,

    r u n n i n g at 1 0 M H z , a c c e s s e s d a t a f r o m m e m o r y a t a

    m a x i m u m r a te of 5 m il li on b y t e s p e r s e c o n d , u s i n g m o r e

    t h a n h a lf i ts p i ns t o a c h i e v e t h i s r a t e . A l t h o u g h p a c k a g -

    ing techn olo gy is rapidly increasing th e pins available to

    a chip, it is unlikely that the inc rea se will he 30-fold (the

    6 8 0 0 0 h a s 8 4 p i n s ) . W e w o u l d s u g g e s t a f a c t or of t w o i s

    realistic. Alt hou gh s o m e tec hniq ues are clearly possible

    to increase the transfer rate into an d out of the 68000,

    s u p p l y i n g s u c h a p r o c e s s o r w i t h d a t a a s f a s t a s n e e d e d i s

    a severe constraint. On e of the desi gners of the 88000,

    h a s s t a t e d t h a t all m o d e r n m i c r o p r o c e s s o r s - t h e 6 8 0 0 0

    1This is a conservative estimate, in fact, beca use it ignores

    predictable ecre ases n ~at e delays.

    1 9 8 3 A C M 0 1 4 9 - 7 1 1 1 / 8 3 /0 6 0 0 / 0 1 2 4 5 0 1 . 0 0 124

  • 8/10/2019 Goodman SnoopyProtocol

    2/8

  • 8/10/2019 Goodman SnoopyProtocol

    3/8

  • 8/10/2019 Goodman SnoopyProtocol

    4/8

    B ut wr i te - back has more seve re coherency p ro b -

    l e m s t h a n w r i t e - t h r o u g h , s i n ce e v e n m a i n m e m o r y d o e s

    no t a lways con ta in the cu r r en t ver s ion of a par t ic u la r

    memory loca t ion .

    3 . 2 .

    A N e w W r i t e St r a te g y: W r i t e - O n c e

    W e p r o p o s e a n e w w r i t e s tr a t eg y w h i c h s o lv es t h e

    s ta le d a t a p r o b l e m a n d p r o d u c e s m i n i m a l b u s t raf fic .

    The r ep lac emen t techn i que r equ i r es the fo llowing s t ruc-

    tu r e . Assoc ia t ed wi th each b lock in the cache a r e two

    b i t s def in ing one o f fou r s ta tes fo r the as s oc ia te d da ta :

    I n v a l i d

    There is no data in the b lock.

    Vain There i s da ta in the b lock wh ich has been r ead

    f rom back ing s t o r e and has no t been mod i f ied .

    R eservedT he da t a in the b lock has been loca l ly mod i f ied

    exac t ly once s ince i t was b rough t in to the

    c a c h e a n d t h e c h a n g e h a s b e e n t r a n s m i t t e d t o

    back ing s to r e .

    D / t ry The da ta in the b lock has been loca l ly mod i f ied

    more than once s ince i t was b rough t in to the

    c a c h e a n d t h e l a t e s t c h a n g e h a s n o t b e e n

    t r ansmi t ted to back ing s to r e .

    Wr i te -once r equ i r e s r ap id access to the add ress tag s

    and s ta te b i t pa i r s concu r r en t ly w i th acces ses to the

    addr ess t ags by the CPU. This can most easi ly be

    ach ieved by c r ea t ing two ( iden t ica l ) cop ies o f the tag

    memory . C ens ie r [C ens ie r78] c la ims tha t dup l ic a t ion i s

    the usual way out for resolv in g coll is ions betw een

    cache inva l ida t ion r eques t s and no rmal cache r e f e r -

    ence s . This is not a large cost , s ince a s ingle chip desig n

    o f th i s par t o f the c ache - u s ing p resen t techno l ogy - i s

    qu i te f eas ib le . Fu r ther , we have d iscovered a way to

    r educe subs tan t i a l ly the number o f tag s r equ i r ed . I n

    add i t ion , the same ch ip type cou ld be u sed fo r bo th

    in s tanc es . This is a na tu ra l way to par t i t ion the cache in

    VLSI beca use i t resul ts in a maxima l logic - to-p in rat io .

    We have des igne d and submi t te d fo r f ab r ica t ion such a

    c h i p

    [Ravisha_nkar83].

    The two cop ies a lways con ta in exac t ly the same

    add ress da ta , because they a r e a lways wr i t t en s imul -

    taneous ly . Whi le one un i t i s u sed in the con ven t iona l way

    to suppo r t acces ses by the C PU, a second mon i to r s a l l

    access es to memory v ia the Mult ibus . Fo r each such

    opera t ion , i t checks fo r the a dd ress in the loca l cache . I f

    a match is found on a wr ite operation , i t not[ t ies the

    cache con t ro l le r , and the app rop r ia te b lock in the cache

    is marked

    i n v a l i d .

    I f a match is found on a read opera -

    t ion , no th ing i s done un less the b lock has been mod i f ied ,

    i .e . , i ts s tate is r e s e r v e d or d i r t y . If it is just r e s e r v e d ,

    the s ta te i s changed to

    v a l i d .

    I f i t is d ir ty , th e local sys-

    tems inh ib i t s the back i ng s to r e f rom supp ly ing the da ta .

    I t then supp l ies the da ta i t se lf . 4 On the same bus access

    o r immed i a te ly fo llowing i t, the da ta m us t be wr i t ten to

    back ing s to r e . I n add i t ion , f o r e i ther r ese rve d o r d i r ty

    da ta , the s ta te

    is

    c h a n g e d

    to v a l i d .

    This scheme ach ieves coher ency in the fo l lowing

    way. In i t ial ly wr it e- t hrou gh is empl oyed . However , an

    add i t iona l goa l is ach ieved upon wr i t ing . A ll o ther caches

    are pu rge d o f the b lock be ing wr i t ten , so the c ache wr i t -

    ing th rough the bus now is guaran te ed the on ly copy

    excep t fo r back ing s to r e . I t is so iden t i f ied by be ing

    mark ed rese rved . I f i t is purg ed at th is poin t , no write is

    necessa ry to back ing s to r e , so th i s i s es sen t ia l l y wr i te -

    th rough . ] f ano ther wr i te occu r s , the b lock is marked

    d i r t y . Now wr i te - back i s emp loye d and, on pu rg ing , the

    da ta mus t be r ewr i t ten to back ing s to r e .

    4There is a mechanism in Multibus which allows this capability.

    Unfortunately, it is rarely used, not well-defined, and requires that ]ocal

    caches respond very rapidly. Versabus has a much clea ner mechanism

    by which this end can be accomplished.

    W r i t e - on c e ha s t h e d e s i r a b l e f e a t u r e t h a t u n i t s

    access ing back ing s to r e need no t have a cache , and need

    no t know whether o ther s do o r no t . A cache i s r espons i -

    b le fo r main ta i n ing cons i s tency e xac t ly fo r those cases

    where i t migh t c r ea te a v io la tion , i . e. , whenever i t wr i tes

    to a locati on . Thus i t is possi b le to mix in an arb i tra ry

    way sys tems wh ich emp loy a cache and those wh ich do

    no t ; the la t te r wou ld p robab ly be I /0 dev ices . C ons ider -

    ab le ca r e mus t be exerc i sed , however , when a wr i te

    opera t i on over the bus mod i f ies les s than an en t i r e

    block.

    4 . S i m u l a t i o n

    We des igne d a cache memory sy s tem to work on

    Multibus. To valida te our d esign befor e build ing it we d id

    ex tens ive s imu la t ion u s ing memory t r ace da ta . To da te

    we have per fo rmed ex tens iv e s imu la t ions fo r s ix t r aces ,

    all run nin g und er UNIX: ~

    EDC The UNIX edi tor ed run nin g a scri pt.

    ROFFAS The o ld UNIX tex t proc esso r prog ram roff .

    TRACE The progr am, writ t en in ass embl y

    language , wh ich genera t ed the above

    tra ces for the PDP-11.

    NROFF The pro gra m ~tro~F in te rpr et ing the

    B erke ley macro package -me.

    CACHE The t r ace- d r iven cache s imu la t o r p ro -

    g ram.

    COMPACT A pro gra m usin g an on-line algo ri thm

    which compresses f i les u s ing an adap t ive

    Huf fman code .

    The f irs t three tra ces are for a PDP-11, while the l at t er

    thre e are for a VAX. While the PDP-11 does not run on

    Mul t ibus , i t s in s t ruc t ion se t i s s imi la r to many microp ro -

    cesso r s wh ich do, and the p rog rams u sed fo r t r ac i ng

    were of the k i nd we envis ion for such a syst em. The

    PDP-11 is s imilar in many ways to the MC58000, and has

    in common wi th the 8086 a l imi te d add ress in g capab i l i ty .

    While the VAX also do es no t r un on Multibus, it is an

    example o f a modern in s t ruc t ion se t and , there fo r e i s a

    r easona b le example o f the k ind o f p rocesso r l ike ly to

    appe ar in a s ingle-chi p CPU in the fu ture. I t a lso has a

    la rger add ress space wh ich , as shown in sec t ion 4 .3 , i s

    s ign i f ican t . We a re ac tua l l y u s ing v i r tua l add resse s , bu t

    a ll of the p rog rams we r an a r e smal l enough to fi t in to

    main memor y . S ince we a r e t r ac ing on ly a sing le p ro -

    cess , we conc lude t ha t there i s no s ign i f ican t d i f f e r ence

    be tween v i r tua l and r ea l add re sses .

    In add i t ion to cache parame ter s , miss r a t io s vary

    g rea t ly depend ing on the p rog ram runn ing . Fo r the each

    o f the above t r aces , a w ide and unp red ic t ab le var ia t ion

    occu r r ed as we var ied a s ing le paramet er . Thus p lo t t ing

    paramet er s fo r the ind iv idua l t r aces was o f ten no t

    en l igh ten ing . Averag ing over the th r ee t r ace s in each

    ca tego ry gave much more r evea l ing r esu l t s , p rov id ing

    da ta tha t su gges ted a con t inuous func t ion fo r many o f

    the var iab les s tud ied . Thus a ll ou r r esu l t s a r e ac tua l l y

    the average o f th r ee p rog rams , each runn ing a lone .

    4.1. Effect of Wri te Strategy o n

    B u s T r al l ie

    A i t h e u g h w r i t e - th r o u g h n o r m a l l y g e n e r a t e s l e s s b u s

    t r a f f ic than wr i te -back , the la t t e r can be wor se i f the h i t

    rat i o is low and the b l ock s ize is large. Unde r write-

    b a c k , w h e n a di r ty b l o c k

    is

    p u r g e d , t h e e n t i r e b l o c k

    must be wr i t ten ou t . With wr i te - th rough , on ly tha t po r -

    t ion which was mod i f ied mus t be wr i t ten . We found tha t

    wr i te -ba ck i s dec is ive ly super io r to wr i te - t h rough excep t

    (1) when cach e b lo cks are v ery large, or (2) when the

    cache s ize is very small .

    SUNIX and NROFF are tr ademarks of Bell Labora tories.

    127

  • 8/10/2019 Goodman SnoopyProtocol

    5/8

  • 8/10/2019 Goodman SnoopyProtocol

    6/8

  • 8/10/2019 Goodman SnoopyProtocol

    7/8

    mult iple processors thr ough a co mmon bus such as Eth-

    ernet .

    Clearly there are many envi ronm ents for which this

    model is inapp rop r ia te- response to ind ividual t asks

    may be unpre dict able , for example. However, we believe

    that such a configurat ion has many pot ent ial appl ica-

    t ions and ca n he exploi ted economical ly i f the appropri-

    ate VLSI com pon ent s are designed. We have inves tigat ed

    the design of such components and bel ieve that they are

    both feasible and well-suited for VLSI [Ravishankar83].

    Our analysis indicates that the cache approach is

    reason able for a system where bandwidth between the

    CPU and m ost of its mem or y is severely limited. We have

    demons t r a ted th rough s imula t ion of rea l p rograms that

    a cache me mor y can be used to s ignificant ly reduce the

    amo unt of comm unic at io n a processor requires . White

    we were int ere sted in this for a s ingle-chip microco m-

    puter of the future, we have also demons trat ed that such

    an approach is feasible for one or more current ly popu-

    lar commercia l market s .

    6 . A c k n o w l e d g e m e n t s

    This material is based upon work supported by the

    National Science Foundation under Grant MCS-8202952.

    We thank Dr. A. J. Smith for providing the PDP-11

    trac e tapes upon which much of our early work

    dep end ed. We also wish to th an k T.-H. Yang for develop-

    ing the VAX tr ac e facility. P. Vitale and T. Doyle con tri -

    bu ted much th rough d i scuss ions and by comment ing on

    an early draft of the manuscript .

    7 . R e f e r e n c e s

    [ A m d a h l 8 2 ] C . A m d a h l , pri vat e commuvtict~tiovt a r c h

    8 2 .

    [ B e l l 7 4 ] J . B e l l , D . C a s a s e n t , a n d C . G . Be l l , A n i n v e s t i ga -

    t i o n o f a l t e r n a ti v e c a c h e o r g a n i z at i o n s , I E E E

    Trans. on Computers

    V o l . C - 2 3 , N o . 4 , A p r i l 1 9 7 4 ,

    pp. 348-351.

    [Bel l78] C . BeU, J . Judge, J. McNamara, C o m p u t e r

    engi nee ri~ y: a DEC viezu o t tard~are system

    design Digital Pres s, Bedford, Mass., 1978.

    [Censier 78] L. M. Censie r and P. Fe autr ier, A new solu-

    t ion to coherence problems in mult icache sys-

    tems, IEEE Trans. on Computers V o l . C - 2 7, N o .

    12, December 1978, pp. 1112-1118.

    [Ea ston 78] M. C. Eas ton a nd R. Fagin, Cold- start vs.

    warm-start miss rat ios , CACM Vol. 21, No. 10,

    October 1978, pp. 888-872.

    [Grose h 53] H. A. Grosch, High Spee d Arithme tic: the

    Digital Computer as a Research Tool, Journsd of

    the Optical Socie ty o f America Vol. 43, No. 4, (April

    1 9 5 3 ) .

    [ H o o g e n d o o r n 7 7 ] C . H . H o o g e n d o o r n , R e d u c t i o n o f

    memory in ter ference in mul t ip rocessor sys tems ,

    Proc. 4th Annual Syrup. Comput. Arch. 1977, pp.

    1 7 9 - 1 8 3 .

    [IBM 74]

    " S y s t e m / 3 7 0

    model 155 theory of

    operat ion/diagrams manual (volume 5): buffer con-

    trol unit, IBM Syste m Produ cts Division,

    Poughkeepsie, N.Y., 1974.

    [IBM 76] Sys tem /37 0 model 168 t heo ry of

    opera t io n /d iagra ms manual (vo lume 1) , Document

    No. SY22-6931-3, IBM System Products Division,

    Poughkeepsie, N.Y., 1978.

    [IEEE 80] Proposed micr ocom pute r syste: L flus s tan-

    dard (P796 bus), IEEE Computer Society Subcom-

    mit tee Microcomputer S y s t e m ~ Group

    0eLober

    1 9 8 0 .

    [ K a p l a n 7 3 ] K . R . K a p l a n a n d R . 0. W i n d e r , C a c h e - b a s e d

    computer sys tems , Computer March 1973, pp.

    30-36.

    [Knight 86] J. R. Knight, Changes in c.~mputer perfo r-

    mance, Datamation VoL 12, No. 9, September

    1966, pp. 40-54.

    [Lindsay 81] Cache Memory for Microprocess ors, Com-

    put er Architectu re Nevgs ACM - SIGARCH Vol. 9,

    No. 5, (August 1981), pp. 6-13.

    [Liptay 68] J. S. Liptay, Str uct ura l aspe cts of the Sys-

    tem/360 Model 85, Part lI: the cache, IBM Syst. J.

    Vol. 7, No. 1, 1968, pp. 15-21.

    [Nort on 82] R. L. Nor ton an d J. L. Abr aha m, Using write

    back cache to improve performance of mul t iuser

    mul t ip rocessors , 1982 Int. Conf. on Par. Prec. ,

    IEEE cat. no. 82CH1794-7, 1982, pp. 326-331.

    [Patel 82] Analysis of mult ipr ocess or with private cache

    memo rie s, J. H. Patel, IEEE Trans. on Computers

    Vol. C-31, No. 4, April 1982, pp. 296-304.

    [Rao 78] G. S. Rao, Pe rf orm an ce Anal ysis of Cache

    Memories, Journal of the ACM Vol. 25, July 1978,

    pp.

    3 7 8 - 3 9 5 .

    [ R a v i s h a n k a r 8 3 ] C. V . R a ~ s h a n k a r a n d J . G o o d m a n ,

    C a c h e i m p l e m e n t a t i o n f o r m u l t i p l e m i c r o p r o c e s -

    s o r s ,

    Digest of Papers Spri ng COMPCON 83 IEEE

    Comp uter Society Press, March 1983.

    [Sie wior ek 82] D. P. Siew iorek, C. G. Bell, and A. Newell,

    Computer Structures: tWgnciples and Examples

    McGraw-Hill, New York, N.Y., 1982.

    [Smith 82] A. J. Smith, Cache memor ies, Computing

    Surveys Vol. 14, No. 3, September 1982, pp. 473-

    530.

    [Sm ith 83] J. E. Smith a nd J. R. Goodma n, A stu dy of

    ins t ruct ion cache organ iza t ions and rep laceme nt

    policies, Tenth Annual Sympos ium on C o m p u t e r

    Archi tecture June 1983.

    [Sol omon 68] M. B. Solomon, Jr., Econ omie s of Scale a nd

    the IBM System/360, CACM Vol. 9, No. 6, June

    1968, pp. 435-440.

    [Tang 78] C. K. Tang, Cache s yste m desi gn in the tightl y

    coupled mult tproeessor system, AFIPS Proc.,

    NCC Vol. 45, pp. 749-753, 1976.

    [ T I

    82]

    Texas Instruments MOS M e m o r y D t ~ a Book

    Texas Instruments, Inc., Memory Division, Houston,

    Texas, pp. 106-111, 1982.

    [Tre denn ick 82] N. Trede nnick , The IBM mic ro/ 370 pro-

    ject , publ ic lectur e for Distinguished Lecturer

    Series Computer Sciences Department , Universi ty

    of Wisco nsin- Madis on, Marc h 31, 1982.

    [Widdoes 79] L. C. Widdoes, S-1 Mult ipro cess or arc hit ec -

    ture (MULT-2), 1979 Annual Report - the S-1 Pro-

    ject Volume 1: Architecture Lawrence Livermore

    Laboratories, Tech. Report UCID 18819, 1979.

    [Yen 82] W. C. Yen and K. S. Fu, Coh ere nce p ro bl em in a

    muit icache system, 1982 Int. Confl on Par. Proc.

    IEEE cat. no. 82CH1794-7, 1982, pp. 332-339.

    130

  • 8/10/2019 Goodman SnoopyProtocol

    8/8

    1o'-

    i

    l O S .

    1 0 `

    1 0- 1 . . . . . . . . i . . . . . . . . . . . . . . . . l . . . . . .

    1 0 ' 1 0 m 1 ; C P 1 0

    C a e . h e $ 1 s e ( b y t e s )

    ~g. I. Bus Tra.mder and Miss Ratios vs. Cach e Size; lo cks are

    4 bytes; PDP -11 traces. The b us trans~ar ratio is the nu mb er

    o f t r a n s f e r s b e t w e e n c a c h e a n d m a i n s t o r e r e l a t i v e ' to t ho se

    n e c e s s a r y .if t h e r e w e r e n o c a c h e .

    1 0 `

    i

    01

    t / Q

    1 0 e 1 . . . . . . . . . . .

    I O ' 10 s 110 110"

    C s Q h e g l z e ( b 7 1 ~ e s )

    Fig. 2. Bus Transfer and Miss Ratios vs. Ca ch e Size; 4-byte

    bloclcs; AX - 11 traces.

    i 10.

    1 0 " l

    . . . . . . . . i

    1 0 ` 1 0 '

    B l o o k S J L iI s ( b y t e s )

    Fit.

    3 . M i s s

    Ratio vs. Block Size for wa rm and cold starts;

    PDP -I 1 traces.

    1 0 '

    1 0 `

    . . . . . l lo l . . . . .

    B l o c k ~ e ( b y 't e s )

    Fig, 4. Miss Ratio vs. Block Size for wa rm an d cold starts;

    V A ) C - I 1 t r a c e s .

    2

    J

    A C e l d

    0 W lu e t s

    1 0 a ,

    1 0 1 0 j

    m oo , k e l l = o ( by ' t e rn )

    Fig. 5. Bus Transfer Ratio vs. Block Size for wa rm and cold

    starts; D P- 11 traces.

    1 0 -

    E

    1

    0 a

    i

    o w n S ta r t

    I

    1 0 0 . . . . . . . 110:

    B I ~ S L s o ( b y t e l )

    Fig. 6. Bus Trans~er Ratio vs. Block Size for- war m an d cold

    starts; AX -II traces.

    1 0 ` -

    O '

    J

    A ~ a ~ e ~

    o W n ~ ~ a L

    1 0 . . . . . . . . . . . . . . . .

    1 0 ` 1 0 ' 1 1 0 `

    A d d L r e l o B l o c t k S l s e ( b F ' t , e s )

    Fig. 7. Miss ratio vs, Address Block Size for wa rm and cold

    starts,

    ~ 1 C '

    1 0

    x WOa . VA . X - - 1 1

    W O T ] P D P - - 1 1

    l o '

    1 0 ~ 1 0 "

    A d e L T ' em m B I Q o k S J L =e ( b y ' t e r n )

    Fig. a. Bus Transfer Ratio vs. Address Block Size for warm a n d

    cold starts ; WOA: add ress blocks are res erv ed. WOT: ira_risker

    bloc k are reserved.

    131