Upload
kenanmahmutovic
View
222
Download
0
Embed Size (px)
Citation preview
8/10/2019 Goodman SnoopyProtocol
1/8
U S I N G C A C H E M E M O R Y T O R E D U C E P R O C E S S O R - M E M O R Y T R A F F I C
J a m e s R . G o o d m a n
D e p a r t m e n t o f C o m p u t e r S c i e n c e s
U n i v e r s i t y o f W i s c o n s i n - M a d i s o n
M a d i s o n , W I 5 8 7 0 6
A B S T R A C T - T h e
i m p o r t a n c e o f r e d u c i n g p r o c e s s o r -
m e m o r y b a n d w i d t h i s r e c o g n i z e d i n t w o d i s t i n c t s i t u a -
t i o n s : s i n g l e b o a r d c o m p u t e r s y s t e m s a n d m i c r o p r o c e s -
s o r s o f t h e f u t u r e . C a c h e m e m o r y i s i n v e s t i g a t e d a s a
w a y t o r e d u c e t h e m e m o r y - p r o c e s s o r t r a ff i c . W e s h o w
t h a t t r a d i t i o n a l c a c h e s w h i c h d e p e n d h e a v i l y o n s p a t i a l
l o c a l i t y ( l o o k - a h e a d ) f o r t h e i r p e r f o r m a n c e a r e i n a p -
p r o p r i a t e i n t h e s e e n v i r o n m e n t s b e c a u s e t h e y g e n e r a t e
l a r g e b u r s t s o f b u s t r a ff i c . A c a c h e e x p l o i t i n g p r i m a r i l y
t e m p o r a l l o c a l i t y ( l o o k - b e h i n d ) is t h e n p r o p o s e d a n d
d e m o n s t r a t e d t o b e e f f e c t i v e i n a n e n v i r o n m e n t w h e r e
p r o c e s s s w i t c h e s a r e i n f r e q u e n t . W e a r g u e t h a t s u c h a n
e n v i r o n m e n t i s p o s s i b l e i f t h e t r a f f i c t o b a c k i n g s t o r e i s
s m a l l e n o u g h t h a t m a n y p r o c e s s o r s c a n s h a r e a c o m m o n
m e m o r y a n d i f t h e c a c h e d a t a c o n s i s t e n c y p r o b l e m i s
s o l ve d . W e d e m o n s t r a t e t h a t s u c h a c a c h e c a n i n d e e d
r e d u c e t r a f f i c t o m e m o r y g r e a t l y , a n d i n t r o d u c e e .r
e l e g a n t s o l u t i o n t o t h e c a c h e c o h e r e n c y p r o b l e m .
1 .
I n t r o d u c t i o n
B e c a u s e t h e r e a r e s t ra i g h t fo r w a r d w a y s t o c o n -
s t r u c t p o w e r f u l , c o s t - ef f e c t iv e s y s t e m s u s i n g r a n d o m
a c c e s s m e m o r i e s a n d s in gl e- ch ip m i c r o p r o c e s s o r s , s e m -
i c o n d u c t o r t e c h n o l o g y h a s , u n t i l n o w , h a d t h e g r e a t e s t
i m p a c t t h r o u g h t h e se c o m p o n e n t s . H i g h - p e r f o r m a n c e
processors , howe ver, are still e yo nd the capability of a
s i ng l e- c hi p i m p l e m e n t a t i o n a n d a r e n o t e a s i l y p a rt i -
tioned in a wa y wh ich can effectively exploit the t echnol-
o g y a n d e c o n o m i e s o f V LS I. A n i n te re st in g p h e n o m e n o n
h a s o c c u r r e d i n t h e p r e v i o u s d e c a d e a s a r e su l t o f t h is
d i sp a ri t y. M e m o r y c o s t s h a v e d r o p p e d r a d i ca l ly a n d c o n -
sistently f o r c o m p u t e r s y s t e m s o f a l l s i z e s . W h i le t h e
c o m p o n e n t c o s t o f a C P U ( s i n g l e - c h i p im p l e m e n t a t i o n s
e x c l u d e d ) h a s d e c l i n e d s i g n i f i c a n t ly o v e r t h e s a m e
p e r i o d , t h e r e d u c t i o n h a s b e e n l e s s d r a m a t i c . A r e s u l t is
t h a t t h e a m o u n t o f m e m o r y t h o u g h t t o h e a p p r o p r i a t e
f o r a g i v e n s p e e d p r o c e s s o r h a s g r o w n d r a m a t i c a l l y i n
r e c e n t y e a r s . T o d a y s m a l l m i n i c o m p u t e r s h a v e m e m o r y
a s l a r g e a s t h a t o f t h e m o s t e x p e n s i v e m a c h i n e s o f a
d e c a d e a g o .
P e r m i s s i o n t o c o p y w i t h o u t
f e e a l l
o r p a r t o f t h i s m a te r i a l i s g r a n t e d
p r o v i d e d t h a t t h e c o p i e s a r e n o t m a d e o r d i s t r i b u te d f o r d i r e c t
c o m m e r c i a l a d v a n t a g e , t h e A C M c o p y r i g h t n o ti c e a n d t h e t i t l e o f th e
p u b l i c a t i o n a n d i t s d a t e a p p e a r , a n d n o t i c e i s g iv e n t h a t c o p y in g i s b y
p e r m i s s i o n o f t h e A s s o c i a t i o n fo r C o m p u t i n g M a c h i n e r y . T o c o p y
o th e r wi se , o r t o r e p u b l i sh , r e q u i r e s a f e e a n d /o r sp e c i f ic p e r m i ss io n .
T h e i m p a c t o f V L SI h a s b e e n v e r y d i f f e r e n t i n
m i c r o p r o c e s s o r a p p l i c a t i o n s . H e r e m e m o r y i s s t i ll
r e g a r d e d a s a n e x p e n s i v e c o m p o n e n t i n t h e s y s t e m , a n d
t h o s e f a m i l i a r p r i m a r i l y w i t h a m i n i c o m p u t e r o r m a i n -
f r a m e e n v i r o n m e n t a r e o f t e n s c o r n f u l o f t h e t r o u b l e t o
w h i c h m i c r o p r o c e s s o r u s e r s g o t o c o n s e r v e m e m o r y .
T h e re a s o n , o f c o u r s e , i s t h a t e v e n t h e s m a l l m e m o r y i n a
m i c r o p r o c e s s o r i s a m u c h l a r g e r p o r t i o n o f t h e t o t a l s ys -
t e m c o s t t h a n t h e m u c h l a r g e r m e m o r y o n a t y p i c a l
m a i n f r a m e s y s t e m . T h i s r e s u l t s f r o m t h e f a c t t h a t
m e m o r y a n d p r o c e s s o r s a r e i m p l e m e n t e d i n t h e s a m e
t e c h n o l o g y .
1 .1 . A S u p e r C P U
W i t h t h e a d v a n c e s t o VL S I o c c u r r i n g n o w a n d c o n -
t i n u i n g o v e r t h e n e x t f e w y e a r s , i t w i ll b e c o m e p o s s i b l e
t o f a b r i c a t e c i r c u i t s t h a t a r e o n e t o t w o o r d e r s o f m a g n i -
t u d e m o r e c o m p l e x t h a n c u r r e n t l y a v a i l a b l e micropro-
c e s s o r s . I t w i ll s o o n b e p o s s i b l e t o f a b r i c a t e a n
e x t r e m e l y h i g h - p e r f o r m a n c e C P U on a s i n g le c h i p , I f t h e
e n t i r e c h i p i s d e v o t e d t o t h e C P U , h o w e v e r , i t i s n o t a
g o o d i d e a . E x t r a p o l a t i n g h i s t o r i c a l t r e n d s t o p r e d i c t
f u tu r e c o m p o n e n t d en si ti es , w e m i g h t e x p e c t t h a t w it h in
a f e w y e a r s w e s h o u l d b e a b l e t o p u r c h a s e a s in g le - ch i p
p r o c e s s o r c o n t a i n i n g at l e as t t e n t i m e s a s m a n y t r an s is -
t o rs a s o c c u r in, s a y , t h e M C 6 8 0 0 0 . F o r t h e e m p i r i c a l
r u le k n o w n a s G r o s c h ' s l a w [ G r o s c h 5 3 ] , P = k C g, w h e r e
P is s o m e m e a s u r e o f p e r f o r m a n c e , C i s t h e c o st , a n d k
a n d g a r e c o ns t a n t s , K n i g h t [ K n i g h t 6 6 ] c o n c l u d e d t h a t g
is a t l ea st 2, a n d S o l o m o n [ S o l o m o n 6 6 ] h a s s u g g e s t e d
t h a t g ~ 1 . 4 7 . F o r t h e I B M S y s t e m / B 7 0 f am il y, S i e w i o r e k
d e t e r m i n e d t h a t g a l . 8 [ S ie w io r ek 8 2] . W h i l e G r o s c h ' s l a w
b r e a k s d o w n i n t h e c o m p a r i s o n o f p r o c e s s o r s u s in g
different te ch no lo gy or architectures, it is realistic for
p r e d i c t in g i m p r o v e m e n t s w i t h i n a si n gl e t e c h n o l o g y .
Siewi orek in fact sugge sts tha t it holds by definition.
A s s u m i n g g = 1 .5 a n d u s in g p r o c e s s o r - m e m o r y
b a n d w i d t h a s o u r m e a s u r e o f p e r f o r m a n c e , G r o s c h ' s l a w
p r e d i c t s t h a t a p r o c e s s o r c o n t a i n i n g i 0 t i m e s a s m a n y
t r an s is t or s a s a c u r r e n t m i c r o p r o c e s s o r w o u l d r e q u i r e
3 0 t i me s t he m e m o r y b a n dw i d th . * T h e Mo t o r o l a M C 6 8 0 0 0 ,
r u n n i n g at 1 0 M H z , a c c e s s e s d a t a f r o m m e m o r y a t a
m a x i m u m r a te of 5 m il li on b y t e s p e r s e c o n d , u s i n g m o r e
t h a n h a lf i ts p i ns t o a c h i e v e t h i s r a t e . A l t h o u g h p a c k a g -
ing techn olo gy is rapidly increasing th e pins available to
a chip, it is unlikely that the inc rea se will he 30-fold (the
6 8 0 0 0 h a s 8 4 p i n s ) . W e w o u l d s u g g e s t a f a c t or of t w o i s
realistic. Alt hou gh s o m e tec hniq ues are clearly possible
to increase the transfer rate into an d out of the 68000,
s u p p l y i n g s u c h a p r o c e s s o r w i t h d a t a a s f a s t a s n e e d e d i s
a severe constraint. On e of the desi gners of the 88000,
h a s s t a t e d t h a t all m o d e r n m i c r o p r o c e s s o r s - t h e 6 8 0 0 0
1This is a conservative estimate, in fact, beca use it ignores
predictable ecre ases n ~at e delays.
1 9 8 3 A C M 0 1 4 9 - 7 1 1 1 / 8 3 /0 6 0 0 / 0 1 2 4 5 0 1 . 0 0 124
8/10/2019 Goodman SnoopyProtocol
2/8
8/10/2019 Goodman SnoopyProtocol
3/8
8/10/2019 Goodman SnoopyProtocol
4/8
B ut wr i te - back has more seve re coherency p ro b -
l e m s t h a n w r i t e - t h r o u g h , s i n ce e v e n m a i n m e m o r y d o e s
no t a lways con ta in the cu r r en t ver s ion of a par t ic u la r
memory loca t ion .
3 . 2 .
A N e w W r i t e St r a te g y: W r i t e - O n c e
W e p r o p o s e a n e w w r i t e s tr a t eg y w h i c h s o lv es t h e
s ta le d a t a p r o b l e m a n d p r o d u c e s m i n i m a l b u s t raf fic .
The r ep lac emen t techn i que r equ i r es the fo llowing s t ruc-
tu r e . Assoc ia t ed wi th each b lock in the cache a r e two
b i t s def in ing one o f fou r s ta tes fo r the as s oc ia te d da ta :
I n v a l i d
There is no data in the b lock.
Vain There i s da ta in the b lock wh ich has been r ead
f rom back ing s t o r e and has no t been mod i f ied .
R eservedT he da t a in the b lock has been loca l ly mod i f ied
exac t ly once s ince i t was b rough t in to the
c a c h e a n d t h e c h a n g e h a s b e e n t r a n s m i t t e d t o
back ing s to r e .
D / t ry The da ta in the b lock has been loca l ly mod i f ied
more than once s ince i t was b rough t in to the
c a c h e a n d t h e l a t e s t c h a n g e h a s n o t b e e n
t r ansmi t ted to back ing s to r e .
Wr i te -once r equ i r e s r ap id access to the add ress tag s
and s ta te b i t pa i r s concu r r en t ly w i th acces ses to the
addr ess t ags by the CPU. This can most easi ly be
ach ieved by c r ea t ing two ( iden t ica l ) cop ies o f the tag
memory . C ens ie r [C ens ie r78] c la ims tha t dup l ic a t ion i s
the usual way out for resolv in g coll is ions betw een
cache inva l ida t ion r eques t s and no rmal cache r e f e r -
ence s . This is not a large cost , s ince a s ingle chip desig n
o f th i s par t o f the c ache - u s ing p resen t techno l ogy - i s
qu i te f eas ib le . Fu r ther , we have d iscovered a way to
r educe subs tan t i a l ly the number o f tag s r equ i r ed . I n
add i t ion , the same ch ip type cou ld be u sed fo r bo th
in s tanc es . This is a na tu ra l way to par t i t ion the cache in
VLSI beca use i t resul ts in a maxima l logic - to-p in rat io .
We have des igne d and submi t te d fo r f ab r ica t ion such a
c h i p
[Ravisha_nkar83].
The two cop ies a lways con ta in exac t ly the same
add ress da ta , because they a r e a lways wr i t t en s imul -
taneous ly . Whi le one un i t i s u sed in the con ven t iona l way
to suppo r t acces ses by the C PU, a second mon i to r s a l l
access es to memory v ia the Mult ibus . Fo r each such
opera t ion , i t checks fo r the a dd ress in the loca l cache . I f
a match is found on a wr ite operation , i t not[ t ies the
cache con t ro l le r , and the app rop r ia te b lock in the cache
is marked
i n v a l i d .
I f a match is found on a read opera -
t ion , no th ing i s done un less the b lock has been mod i f ied ,
i .e . , i ts s tate is r e s e r v e d or d i r t y . If it is just r e s e r v e d ,
the s ta te i s changed to
v a l i d .
I f i t is d ir ty , th e local sys-
tems inh ib i t s the back i ng s to r e f rom supp ly ing the da ta .
I t then supp l ies the da ta i t se lf . 4 On the same bus access
o r immed i a te ly fo llowing i t, the da ta m us t be wr i t ten to
back ing s to r e . I n add i t ion , f o r e i ther r ese rve d o r d i r ty
da ta , the s ta te
is
c h a n g e d
to v a l i d .
This scheme ach ieves coher ency in the fo l lowing
way. In i t ial ly wr it e- t hrou gh is empl oyed . However , an
add i t iona l goa l is ach ieved upon wr i t ing . A ll o ther caches
are pu rge d o f the b lock be ing wr i t ten , so the c ache wr i t -
ing th rough the bus now is guaran te ed the on ly copy
excep t fo r back ing s to r e . I t is so iden t i f ied by be ing
mark ed rese rved . I f i t is purg ed at th is poin t , no write is
necessa ry to back ing s to r e , so th i s i s es sen t ia l l y wr i te -
th rough . ] f ano ther wr i te occu r s , the b lock is marked
d i r t y . Now wr i te - back i s emp loye d and, on pu rg ing , the
da ta mus t be r ewr i t ten to back ing s to r e .
4There is a mechanism in Multibus which allows this capability.
Unfortunately, it is rarely used, not well-defined, and requires that ]ocal
caches respond very rapidly. Versabus has a much clea ner mechanism
by which this end can be accomplished.
W r i t e - on c e ha s t h e d e s i r a b l e f e a t u r e t h a t u n i t s
access ing back ing s to r e need no t have a cache , and need
no t know whether o ther s do o r no t . A cache i s r espons i -
b le fo r main ta i n ing cons i s tency e xac t ly fo r those cases
where i t migh t c r ea te a v io la tion , i . e. , whenever i t wr i tes
to a locati on . Thus i t is possi b le to mix in an arb i tra ry
way sys tems wh ich emp loy a cache and those wh ich do
no t ; the la t te r wou ld p robab ly be I /0 dev ices . C ons ider -
ab le ca r e mus t be exerc i sed , however , when a wr i te
opera t i on over the bus mod i f ies les s than an en t i r e
block.
4 . S i m u l a t i o n
We des igne d a cache memory sy s tem to work on
Multibus. To valida te our d esign befor e build ing it we d id
ex tens ive s imu la t ion u s ing memory t r ace da ta . To da te
we have per fo rmed ex tens iv e s imu la t ions fo r s ix t r aces ,
all run nin g und er UNIX: ~
EDC The UNIX edi tor ed run nin g a scri pt.
ROFFAS The o ld UNIX tex t proc esso r prog ram roff .
TRACE The progr am, writ t en in ass embl y
language , wh ich genera t ed the above
tra ces for the PDP-11.
NROFF The pro gra m ~tro~F in te rpr et ing the
B erke ley macro package -me.
CACHE The t r ace- d r iven cache s imu la t o r p ro -
g ram.
COMPACT A pro gra m usin g an on-line algo ri thm
which compresses f i les u s ing an adap t ive
Huf fman code .
The f irs t three tra ces are for a PDP-11, while the l at t er
thre e are for a VAX. While the PDP-11 does not run on
Mul t ibus , i t s in s t ruc t ion se t i s s imi la r to many microp ro -
cesso r s wh ich do, and the p rog rams u sed fo r t r ac i ng
were of the k i nd we envis ion for such a syst em. The
PDP-11 is s imilar in many ways to the MC58000, and has
in common wi th the 8086 a l imi te d add ress in g capab i l i ty .
While the VAX also do es no t r un on Multibus, it is an
example o f a modern in s t ruc t ion se t and , there fo r e i s a
r easona b le example o f the k ind o f p rocesso r l ike ly to
appe ar in a s ingle-chi p CPU in the fu ture. I t a lso has a
la rger add ress space wh ich , as shown in sec t ion 4 .3 , i s
s ign i f ican t . We a re ac tua l l y u s ing v i r tua l add resse s , bu t
a ll of the p rog rams we r an a r e smal l enough to fi t in to
main memor y . S ince we a r e t r ac ing on ly a sing le p ro -
cess , we conc lude t ha t there i s no s ign i f ican t d i f f e r ence
be tween v i r tua l and r ea l add re sses .
In add i t ion to cache parame ter s , miss r a t io s vary
g rea t ly depend ing on the p rog ram runn ing . Fo r the each
o f the above t r aces , a w ide and unp red ic t ab le var ia t ion
occu r r ed as we var ied a s ing le paramet er . Thus p lo t t ing
paramet er s fo r the ind iv idua l t r aces was o f ten no t
en l igh ten ing . Averag ing over the th r ee t r ace s in each
ca tego ry gave much more r evea l ing r esu l t s , p rov id ing
da ta tha t su gges ted a con t inuous func t ion fo r many o f
the var iab les s tud ied . Thus a ll ou r r esu l t s a r e ac tua l l y
the average o f th r ee p rog rams , each runn ing a lone .
4.1. Effect of Wri te Strategy o n
B u s T r al l ie
A i t h e u g h w r i t e - th r o u g h n o r m a l l y g e n e r a t e s l e s s b u s
t r a f f ic than wr i te -back , the la t t e r can be wor se i f the h i t
rat i o is low and the b l ock s ize is large. Unde r write-
b a c k , w h e n a di r ty b l o c k
is
p u r g e d , t h e e n t i r e b l o c k
must be wr i t ten ou t . With wr i te - th rough , on ly tha t po r -
t ion which was mod i f ied mus t be wr i t ten . We found tha t
wr i te -ba ck i s dec is ive ly super io r to wr i te - t h rough excep t
(1) when cach e b lo cks are v ery large, or (2) when the
cache s ize is very small .
SUNIX and NROFF are tr ademarks of Bell Labora tories.
127
8/10/2019 Goodman SnoopyProtocol
5/8
8/10/2019 Goodman SnoopyProtocol
6/8
8/10/2019 Goodman SnoopyProtocol
7/8
mult iple processors thr ough a co mmon bus such as Eth-
ernet .
Clearly there are many envi ronm ents for which this
model is inapp rop r ia te- response to ind ividual t asks
may be unpre dict able , for example. However, we believe
that such a configurat ion has many pot ent ial appl ica-
t ions and ca n he exploi ted economical ly i f the appropri-
ate VLSI com pon ent s are designed. We have inves tigat ed
the design of such components and bel ieve that they are
both feasible and well-suited for VLSI [Ravishankar83].
Our analysis indicates that the cache approach is
reason able for a system where bandwidth between the
CPU and m ost of its mem or y is severely limited. We have
demons t r a ted th rough s imula t ion of rea l p rograms that
a cache me mor y can be used to s ignificant ly reduce the
amo unt of comm unic at io n a processor requires . White
we were int ere sted in this for a s ingle-chip microco m-
puter of the future, we have also demons trat ed that such
an approach is feasible for one or more current ly popu-
lar commercia l market s .
6 . A c k n o w l e d g e m e n t s
This material is based upon work supported by the
National Science Foundation under Grant MCS-8202952.
We thank Dr. A. J. Smith for providing the PDP-11
trac e tapes upon which much of our early work
dep end ed. We also wish to th an k T.-H. Yang for develop-
ing the VAX tr ac e facility. P. Vitale and T. Doyle con tri -
bu ted much th rough d i scuss ions and by comment ing on
an early draft of the manuscript .
7 . R e f e r e n c e s
[ A m d a h l 8 2 ] C . A m d a h l , pri vat e commuvtict~tiovt a r c h
8 2 .
[ B e l l 7 4 ] J . B e l l , D . C a s a s e n t , a n d C . G . Be l l , A n i n v e s t i ga -
t i o n o f a l t e r n a ti v e c a c h e o r g a n i z at i o n s , I E E E
Trans. on Computers
V o l . C - 2 3 , N o . 4 , A p r i l 1 9 7 4 ,
pp. 348-351.
[Bel l78] C . BeU, J . Judge, J. McNamara, C o m p u t e r
engi nee ri~ y: a DEC viezu o t tard~are system
design Digital Pres s, Bedford, Mass., 1978.
[Censier 78] L. M. Censie r and P. Fe autr ier, A new solu-
t ion to coherence problems in mult icache sys-
tems, IEEE Trans. on Computers V o l . C - 2 7, N o .
12, December 1978, pp. 1112-1118.
[Ea ston 78] M. C. Eas ton a nd R. Fagin, Cold- start vs.
warm-start miss rat ios , CACM Vol. 21, No. 10,
October 1978, pp. 888-872.
[Grose h 53] H. A. Grosch, High Spee d Arithme tic: the
Digital Computer as a Research Tool, Journsd of
the Optical Socie ty o f America Vol. 43, No. 4, (April
1 9 5 3 ) .
[ H o o g e n d o o r n 7 7 ] C . H . H o o g e n d o o r n , R e d u c t i o n o f
memory in ter ference in mul t ip rocessor sys tems ,
Proc. 4th Annual Syrup. Comput. Arch. 1977, pp.
1 7 9 - 1 8 3 .
[IBM 74]
" S y s t e m / 3 7 0
model 155 theory of
operat ion/diagrams manual (volume 5): buffer con-
trol unit, IBM Syste m Produ cts Division,
Poughkeepsie, N.Y., 1974.
[IBM 76] Sys tem /37 0 model 168 t heo ry of
opera t io n /d iagra ms manual (vo lume 1) , Document
No. SY22-6931-3, IBM System Products Division,
Poughkeepsie, N.Y., 1978.
[IEEE 80] Proposed micr ocom pute r syste: L flus s tan-
dard (P796 bus), IEEE Computer Society Subcom-
mit tee Microcomputer S y s t e m ~ Group
0eLober
1 9 8 0 .
[ K a p l a n 7 3 ] K . R . K a p l a n a n d R . 0. W i n d e r , C a c h e - b a s e d
computer sys tems , Computer March 1973, pp.
30-36.
[Knight 86] J. R. Knight, Changes in c.~mputer perfo r-
mance, Datamation VoL 12, No. 9, September
1966, pp. 40-54.
[Lindsay 81] Cache Memory for Microprocess ors, Com-
put er Architectu re Nevgs ACM - SIGARCH Vol. 9,
No. 5, (August 1981), pp. 6-13.
[Liptay 68] J. S. Liptay, Str uct ura l aspe cts of the Sys-
tem/360 Model 85, Part lI: the cache, IBM Syst. J.
Vol. 7, No. 1, 1968, pp. 15-21.
[Nort on 82] R. L. Nor ton an d J. L. Abr aha m, Using write
back cache to improve performance of mul t iuser
mul t ip rocessors , 1982 Int. Conf. on Par. Prec. ,
IEEE cat. no. 82CH1794-7, 1982, pp. 326-331.
[Patel 82] Analysis of mult ipr ocess or with private cache
memo rie s, J. H. Patel, IEEE Trans. on Computers
Vol. C-31, No. 4, April 1982, pp. 296-304.
[Rao 78] G. S. Rao, Pe rf orm an ce Anal ysis of Cache
Memories, Journal of the ACM Vol. 25, July 1978,
pp.
3 7 8 - 3 9 5 .
[ R a v i s h a n k a r 8 3 ] C. V . R a ~ s h a n k a r a n d J . G o o d m a n ,
C a c h e i m p l e m e n t a t i o n f o r m u l t i p l e m i c r o p r o c e s -
s o r s ,
Digest of Papers Spri ng COMPCON 83 IEEE
Comp uter Society Press, March 1983.
[Sie wior ek 82] D. P. Siew iorek, C. G. Bell, and A. Newell,
Computer Structures: tWgnciples and Examples
McGraw-Hill, New York, N.Y., 1982.
[Smith 82] A. J. Smith, Cache memor ies, Computing
Surveys Vol. 14, No. 3, September 1982, pp. 473-
530.
[Sm ith 83] J. E. Smith a nd J. R. Goodma n, A stu dy of
ins t ruct ion cache organ iza t ions and rep laceme nt
policies, Tenth Annual Sympos ium on C o m p u t e r
Archi tecture June 1983.
[Sol omon 68] M. B. Solomon, Jr., Econ omie s of Scale a nd
the IBM System/360, CACM Vol. 9, No. 6, June
1968, pp. 435-440.
[Tang 78] C. K. Tang, Cache s yste m desi gn in the tightl y
coupled mult tproeessor system, AFIPS Proc.,
NCC Vol. 45, pp. 749-753, 1976.
[ T I
82]
Texas Instruments MOS M e m o r y D t ~ a Book
Texas Instruments, Inc., Memory Division, Houston,
Texas, pp. 106-111, 1982.
[Tre denn ick 82] N. Trede nnick , The IBM mic ro/ 370 pro-
ject , publ ic lectur e for Distinguished Lecturer
Series Computer Sciences Department , Universi ty
of Wisco nsin- Madis on, Marc h 31, 1982.
[Widdoes 79] L. C. Widdoes, S-1 Mult ipro cess or arc hit ec -
ture (MULT-2), 1979 Annual Report - the S-1 Pro-
ject Volume 1: Architecture Lawrence Livermore
Laboratories, Tech. Report UCID 18819, 1979.
[Yen 82] W. C. Yen and K. S. Fu, Coh ere nce p ro bl em in a
muit icache system, 1982 Int. Confl on Par. Proc.
IEEE cat. no. 82CH1794-7, 1982, pp. 332-339.
130
8/10/2019 Goodman SnoopyProtocol
8/8
1o'-
i
l O S .
1 0 `
1 0- 1 . . . . . . . . i . . . . . . . . . . . . . . . . l . . . . . .
1 0 ' 1 0 m 1 ; C P 1 0
C a e . h e $ 1 s e ( b y t e s )
~g. I. Bus Tra.mder and Miss Ratios vs. Cach e Size; lo cks are
4 bytes; PDP -11 traces. The b us trans~ar ratio is the nu mb er
o f t r a n s f e r s b e t w e e n c a c h e a n d m a i n s t o r e r e l a t i v e ' to t ho se
n e c e s s a r y .if t h e r e w e r e n o c a c h e .
1 0 `
i
01
t / Q
1 0 e 1 . . . . . . . . . . .
I O ' 10 s 110 110"
C s Q h e g l z e ( b 7 1 ~ e s )
Fig. 2. Bus Transfer and Miss Ratios vs. Ca ch e Size; 4-byte
bloclcs; AX - 11 traces.
i 10.
1 0 " l
. . . . . . . . i
1 0 ` 1 0 '
B l o o k S J L iI s ( b y t e s )
Fit.
3 . M i s s
Ratio vs. Block Size for wa rm and cold starts;
PDP -I 1 traces.
1 0 '
1 0 `
. . . . . l lo l . . . . .
B l o c k ~ e ( b y 't e s )
Fig, 4. Miss Ratio vs. Block Size for wa rm an d cold starts;
V A ) C - I 1 t r a c e s .
2
J
A C e l d
0 W lu e t s
1 0 a ,
1 0 1 0 j
m oo , k e l l = o ( by ' t e rn )
Fig. 5. Bus Transfer Ratio vs. Block Size for wa rm and cold
starts; D P- 11 traces.
1 0 -
E
1
0 a
i
o w n S ta r t
I
1 0 0 . . . . . . . 110:
B I ~ S L s o ( b y t e l )
Fig. 6. Bus Trans~er Ratio vs. Block Size for- war m an d cold
starts; AX -II traces.
1 0 ` -
O '
J
A ~ a ~ e ~
o W n ~ ~ a L
1 0 . . . . . . . . . . . . . . . .
1 0 ` 1 0 ' 1 1 0 `
A d d L r e l o B l o c t k S l s e ( b F ' t , e s )
Fig. 7. Miss ratio vs, Address Block Size for wa rm and cold
starts,
~ 1 C '
1 0
x WOa . VA . X - - 1 1
W O T ] P D P - - 1 1
l o '
1 0 ~ 1 0 "
A d e L T ' em m B I Q o k S J L =e ( b y ' t e r n )
Fig. a. Bus Transfer Ratio vs. Address Block Size for warm a n d
cold starts ; WOA: add ress blocks are res erv ed. WOT: ira_risker
bloc k are reserved.
131