Unsupervised word sense disambiguation rivaling supervised.pdf

Embed Size (px)

Citation preview

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    1/8

    U N S U P E R V I S E D W O R D S E N S E D I S A M B I G U A T I O NR I V A L I N G S U P E R V I S E D M E T H O D SD a v i d Y a r o w s k y

    De p a r tme n t o f Co mp u te r a n d In fo rma t io n Sc ie nc eUnivers i ty of PennsylvaniaPhiladelphia , PA 19104, USA

    y a r o w s k y ~ u n a g i , c i s . u p e n n , e d u

    A b s t r a c tThis paper presents an unsupervised learn-ing algorithm for sense disambiguationthat, when trained on unannotated Englishtext, rivals the performance of supervisedtechniques that require time-consuminghand annotations. The algorithm is basedon two powerful constraints - tha t wordstend to have one sense per discourse andone sense per collocation - exploited in aniterative bootstrapping procedure. Testedaccuracy exceeds 96%.

    1 In t ro d u c t io nThis paper presents an unsupervised algorithm thatcan accurately disambiguate word senses in a large,completely untagged corpus ) The alg orithm avoidsthe need for costly hand-tagged training d ata by ex-ploiting two powerful properties of human language:

    1. One sens e p er col loc ati on: 2 Nearby wordsprovide strong and consistent clues to the senseof a target word, conditional on relative dis-tance, order and syntactic relationship.2. One sens e pe r dis cou rse : The sense of a tar-get word is highly consistent within any givendocument.Moreover, language is highly redundant, so thatthe sense of a word is effectively overdetermined by(1) and (2) above. The algori thm uses these prop-erties to incrementally identify collocations for tar-get senses of a word, given a few seed collocations1Note that the problem here is sense disambiguation:assigning each instance of a word to established sensedefinitions (such as in a dictionary). This differs fromsense induction: using distributional similarity to parti-tion word instances into clusters that may have no rela-tion to standard sense partitions.2Here I use the traditional dictionary definition ofcollocation - "appearing in the same location; a juxta-position of words". No idiomatic or non-compositionalinterpretation is implied.

    for each sense, This procedure is robust and self-correcting, and exhibits many strengths of super-vised approaches, including sensitivity to word-orderinforma tion lost in earlier unsupervised algorithms.2 On e Se n s e Pe r D is c o u rs eThe observation that words strongly tend to exhibitonly one sense in a given discourse or docu ment wasstated and quantified in Gale, Church and Yarowsky(1992). Yet to date, the full power of this propertyhas not been exploited for sense disambiguation.

    The work reported here is the first to take advan-tage of this regularity in conjunction with separatemodels of local context for each word. Importantly,I do not use one-sense-per-discourse as a hard con-straint; it affects the classification probabilisticallyand can be overridden when local evidence is strong.In this current work, the one-sense-per-discoursehypothesis was tested on a set of 37,232 examples

    (hand-tagged over a period of 3 years), the samedata studied in the disambiguation experiments. Forthese words, the table below measures the claim'saccuracy (when the word occurs more than once ina discourse, how often it takes on the ma jori ty sensefor the discourse) and appl icab i l i t y (how often theword does occur more than once in a discourse).The one-sense-per-discourse hypothesis :Wordplanttankpoachpalmaxessakebassspacemotioncrane

    Sensesliving/factoryvehicle/contnrsteal/boiltree/handgrid/toolsbenefit/drinkfish/musicvolume/outerlegal/physicalbird/machineAverage

    Accuracy99.8 %99.6 %100.0 %99.8 %I00.0 %100.0 %100.0 %99.2 %99.9 %100.0 %9 9 . 8 %

    Applicblty72.8 %50.5 %44.4 %38.5 %35.5 %33.7 %58.8 %67.7 %49.8 %49.1%5 0 .1 %

    Clearly, the claim holds with very high reliabilityfor these words, and may be confidently exploited

    1 8 9

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    2/8

    a s a n o t h e r s o u r c e o f e v i d e n c e i n s e n s e t a g g i n g . 33 O n e S e n s e P e r C o l l o c a t i o nT h e s t r o n g t e n d e n c y f o r w o rd s t o e x h i b i t o n ly o n es e n s e in a g i v e n c o l l o c a t i o n w a s o b s e r v e d a n d q u a n -t i f ie d in ( Y a r o w s k y , 1 9 9 3 ) . T h i s e f f e c t v a r i e s d e -p e n d i n g o n t h e t y p e o f c o ll o c a t i o n . I t is s t r o n g e s tf o r i m m e d i a t e l y a d j a c e n t c o l l o c a t io n s , a n d w e a k e n sw i t h d i s t a n c e . I t is m u c h s t r o n g e r f o r w o r d s in ap r e d i c a t e - a r g u m e n t r e l a t i o n s h ip t h a n f o r a r b i t r a r ya s s o c i a t io n s a t e q u i v a l e n t d i s t a n c e . I t is v e r y m u c hs t r o n g e r f o r c o l l o c a t io n s w i t h c o n t e n t w o r d s t h a nt h o s e w i t h f u n c t i o n w o r d s . 4 I n g e n e r a l , t h e h i g h r e li -a b i l i t y o f t h i s b e h a v i o r ( i n e x c es s o f 9 7 % f o r a d j a c e n tc o n t e n t w o r d s , f or e x a m p l e ) m a k e s i t a n e x t r e m e l yu s e f u l p r o p e r t y f o r se n s e d i s a m b i g u a t i o n .A s u p e r v i s e d a l g o r i t h m b a s e d o n t h i s p r o p e r t y i sg i v e n i n ( Y a r o w s k y , 1 9 9 4 ) . U s i n g a d e c i s i e n l i s tc o n t r o l s t r u c t u r e b a s e d o n ( R i v e s t , 1 9 8 7 ) , t h i s a l-g o r i t h m i n t e g r a t e s a w i d e d i v e r s i ty o f p o t e n t i a l e v -i d e n c e s o u r c e s ( l e m m a s , i n f l e c te d f o r m s , p a r t s o fs p e e c h a n d a r b i t r a r y w o r d c l a s s e s ) i n a w i d e d i -v e r s i t y o f p o s i t i o n a l r e l a t i o n s h i p s ( i n c l u d i n g l o c a la n d d i s t a n t c o l l o c a t i o n s , t r i g r a m s e q u e n c e s , a n dp r e d i c a t e - a r g u m e n t a s s o c i at i o n ). T h e t r a in i n g p r o -c e d u r e c o m p u t e s t h e w o r d - s e n s e p r o b a b i l i t y d i s tr i -b u t i o n s f o r al l s u c h c o l l o c a t i o n s , a n d o r d e r s t h e m b yr 0 / P r ( S e n s e A l C o l l o e a t i o n i ~ x 5t h e l o g - l ik e l i h o o d r a t i o ~ g t p r I S e n s e B l C o l l o e a t i o n i ~ ) ,w i t h o p t i o n a l s te p s f o r i n t e r p o l a t i o n a n d p r u n i n g .N e w d a t a a r e c l a ss if ie d b y u s i n g t h e s i n g l e m o s tp r e d i c t i v e p i e c e o f d i s a m b i g u a t i n g e v i d e n c e t h a t a p -p e a r s i n t h e t a rg e t c o n t e x t . B y n o t c o m b i n i n g p r o b -a b i l i t ie s , t h i s d e c i s io n - l i s t a p p r o a c h a v o i d s t h e p r o b -l e m a t i c c o m p l e x m o d e l i n g o f s t a ti s t ic a l d e p e n d e n c ie s

    3I t is i n t e r e s ti ng t o specu l a t e on t h e r easons f o r t h i sphenom enon . M os t o f t he t endency is s t a t is t i ca l: t wod i s ti n c t a r b i t r a r y t e r m s o f m o d e r a t e c o r p u s f r e q u e n c yaxe qu i t e un l i ke ly t o co- occur i n t he s ame d i scour sewhe t he r t hey a r e hom ogr aphs o r no t . T h i s is pa r t i cu -l a r l y t r ue f o r con t en t wor ds , whi ch exh i b i t a "b ur s t y"d i s t r i bu t i on . However , i t appea r s t ha t hum an wr i t e r sa l so have some ac t i ve t endency t o avo i d mi x i ng senseswi t h i n a d is cour se . I n a sma l l s tudy , hom ogr aph pa i r swer e obse r ved t o co- oc cur r ough l y 5 t i mes l e ss o f t en t h ana r b i t r a r y wor d pa i r s o f com par ab l e f r equency . Regar d-l es s o f o r i g in , t h i s p henom enon i s s t r ong enough t o beof s igni f icant pract ical use as an addi t ion al prob abi l i s t icd i sambi gua t i on cons t r a i n t .4T hi s l a t t e r e f f ec t i s ac t ua l ly a con t i nuous f unc t i oncond i t i ona l on t he b u r s t i n e s s o f th e w o r d ( t h e t e n d e n c yof a wor d t o dev i a t e f r om a co ns t an t P o i s son d i s tr i bu t i onin a corpus) .SAs most ra t ios involve a 0 for some observed value,smoo t h i ng i s c ruc i a l. T h e p r oces s empl oyed he r e is s en-s i ti ve t o va r i ab le s i nc l ud i ng t he t ype o f co ll oca t ion ( ad-j acen t b i g r ams o r w i de r con t ex t ) , co l ioca t i ona l d i s t ance ,t ype o f wor d ( con t en t wor d vs. f unc t i on wor d) and t heexp ec t ed am oun t o f no i se i n t he tr a i n i ng da t a . De t a il saxe p r ov i ded i n ( Yar owsky , t o appea r ) .

    e n c o u n t e r e d i n o t h e r f r a m e w o r k s . T h e a l g o r i t h m ise s p e c i a l l y w e l l s u i t e d f o r u t i l i z i n g a l a r g e s e t o f h i g h l yn o n - i n d e p e n d e n t e v i d e n c e s u c h as f o u n d h e r e. I ng e n e r a l , t h e d e c i s i o n - l i s t a l g o r i t h m i s w e l l s u i t e d f o rt h e t a s k o f s e n s e d i s a m b i g u a t i o n a n d w i ll b e u s e d a s .a c o m p o n e n t o f th e u n s u p e r v i s e d a l g o r i t h m b e l ow .

    4 U n s u p e r v i s e d L e a r n i n g A l g o r i t h mW o r d s n o t o n l y t e n d t o o c c u r in c o l l o c a t i o n s t h a tr e l i a b l y i n d i c a t e t h e i r s e n s e, t h e y t e n d t o o c c u r i nm u l t i p l e su c h c o l l o c a t i o n s . T h i s p r o v i d e s a m e c h a -n i s m f o r b o o t s t r a p p i n g a s e n s e ta g g e r . I f o n e b e g i n sw i t h a s m a l l s e t o f s e e d e x a m p l e s r e p r e s e n t a t i v e o ft w o s en s e s o f a w o r d , o n e c a n i n c r e m e n t a l l y a u g -m e n t t h e se s e e d e x a m p l e s w i t h a d d i t i o n a l e x a m p l e so f e a c h s e n s e , u s i n g a c o m b i n a t i o n o f t h e o n e - s e n s e -p e r - c o l l o c a t i o n a n d o n e - s e n s e - p e r - d i s c o u r s e t e n d e n -cies .

    A l t h o u g h s e v e r a l a l g o r i t h m s c a n a c c o m p l i s h s i m -i l a r e n d s , 6 t h e f o l lo w i n g a p p r o a c h h a s t h e a d v a n -t a g e s o f s i m p l i c i t y a n d t h e a b i l i t y to b u i l d o n a ne x i s t i n g s u p e r v i s e d c l a s s if i c a ti o n a l g o r i t h m w i t h o u tm o d i f i c a t i o n . ~ A s s h o w n e m p i r i c a l l y , i t a l so e x h i b i t sc o n s i d e r a b l e e f f e c t i v e n e s s .

    T h e a l g o r i t h m w i ll b e i l l u s t r a t e d b y t h e d i s a m -b i g u a t i o n o f 7 5 3 8 in s t a n c e s o f t h e p o l y s e m o u s w o r dp l a n t i n a p r e v i o u s l y u n t a g g e d c o r p u s .S T E P 1 :

    I n a l a r g e c o r p u s , i d e n t i f y a ll e x a m p l e s o f t h e g i v e np o l y s e m o u s w o r d , s t o r i n g t h e i r c o n t e x t s a s l in e s i na n i n i t ia l l y u n t a g g e d t r a i n i n g s e t . F o r e x a m p l e :S e n s e?

    ??????????????????????

    T r a i n i n g E x a m p l e s ( K e y w o r d i n C o n t e x t ). .. c o m p a n y s a i d th e p l a n t i s s t i l l o p e r a t i n g

    A l t h o u g h t h o u s a n d s o f p l a n t a n d a n i m a l s p e c i e s. .. z o n a l d i s t r i b u t i o n o f p l a n t l i f e . . . .

    . .. t o s tr a i n m i c r o s c o p i c p l a n t l i f e f r o m t h e . ..v i n y l c h l o r i d e m o n o m e r p l a n t , w h i c h i s . ..a n d G o l g i a p p a r a t u s o f p l a n t a n d a n i m a l c e l l s. .. c o m p u t e r d i s k d r i v e p l a n t l o c a t e d i n . ..

    . .. d i v i d e l i f e i n t o p l a n t a n d a n i m a l k i n g d o m. .. c l o s e - u p s t u d i e s o f p l a n t l i fe a n d n a t u r a l

    . .. N i s s a n c a r a n d t r u c k p l a n t i n J a p a n i s . ... .. k e e p a m a n u f a c t u r i n g

    . .. m o l e c u l e s f o u n d i n

    . .. u n i o n r e s p o n s e s t o. .. a n i m a l r a t h e r t h a n

    . .. m a n y d a n g e r s t ocompany manufacturing. .. g r o w t h o f a q u a t i c

    a u t o m a t e d m a n u f a c t u r i n g. .. A n i m a l a n d

    d i s c o v e r e d a t a S t . L o u i s

    p l a n t p r o f i t a b l e w i t h o u tp l a n t a n d a n i m a l t i s s u ep l a n t c l o s u r e s . . . .p l a n t t i s s u e s c a n b ep l a n t a n d a n i m a l l if ep l a n t i s i n O r l a n d o . ..p l a n t l i f e i n w a t e r . . .p l a n t i n F r e m o n t ,p l a n t l i f e a r e d e l i c a t e l yp l a n t m a n u f a c t u r i n gc o m p u t e r m a n u f a c t u r in g p l a n t a n d a d j a c e n t . ..

    . . . t h e p r o l i f e r a t i o n o f p l a n t a n d a n i m a l l i f e

    I nc l ud i ng va r i an t s o f t he E M a l gor i t hm ( Ban t u ,1972; Dempster e t a l . , 1977) , especial ly as appl ied inG a l e , Chur ch and Yar owsky ( 1994) .7 I ndeed , any supe r v i sed c l a s s i f i ca t i on a l gor i t hm t ha tr e t u r ns p r obab i l i t i e s w i t h i t s c l a s s i f i ca t i ons may po t en-t ia l ly be used here . Th ese include Bay esian c lass if iers( M os t e l le r and W al lace , 1964) and some i mpl em ent a -t ions of neura l nets , bu t n ot BrK! rules (Br i l l , 1993).

    1 9 0

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    3/8

    S T E P 2 :F o r e a c h p o s s i b l e s e n s e o f t h e w o r d , i d e n t i f y a re l -a t i v e l y sm a l l n u m b e r o f t r a i n i n g e x a m p l e s r e p r es e n -t a t i v e o f t h a t s e n s e, s T h i s c o u l d b e a c c o m p l i s h e db y h a n d t a g g i n g a s u b s e t o f t h e t r a i n i n g s e n t e n ce s .H o w e v e r , I a v o i d t h i s l a b o r i o u s p r o c e d u r e b y i d e n -t i f y i n g a s m a l l n u m b e r o f s e e d c o l l o c a t i o n s r ep r e -s e n t a t i v e o f e ac h s e n s e a n d t h e n t a g g i n g a l l t r a i n -i n g e x a m p l e s c o n t a i n i n g t h e s e e d c o l l o c at e s w i t h t h es e e d 's s e n se l a b el . T h e r e m a i n d e r o f t h e e x a m p l e s( t y p i c a l l y 8 5 -9 8 % ) c o n s t i t u t e a n u n t a g g e d residual.S e v e r a l s tr a t e g i e s f o r i d e n t i f y i n g s e e d s t h a t r e q u i r em i n i m a l o r n o h u m a n p a r t i c i p a t i o n a r e d i sc u s se d i nS e c t i o n 5 .I n t h e e x a m p l e b e l ow , t h e w o r d s life a n d m a n u f a c -tur ing a r e u s e d a s s e e d c o l l o c a t i o n s f o r t h e t w o m a j o rs e n s e s o f p l a n t ( l a b e l e d A a n d B r e s p e c t i v e l y ) . T h i sp a r t i t i o n s t h e t r a i n i n g s e t i n t o 8 2 e x a m p l e s o f li v i n gp l a n t s ( 1 % ) , 1 06 e x a m p l e s o f m a n u f a c t u r i n g p l a n t s( 1 % ) , a n d 7 3 5 0 r e s i d u a l e x a m p l e s ( 9 8 % ) .

    S e n s e T r a i n i n g E x a m p l e sA u s e d t o s t r a i n m i c r o s c o p i cA . .. z o n a l d i s t r i b u t i o n o fA c l o s e - u p s t u d i e s o fA t o o r a p i d g r o w t h o f a q u a t i cA . . . t h e p r o l i f e r a t i o n ofA e s t a b l i s h m e n t p h a s e o f t h eA . .. t h a t d i v i d e l i f e i n t oA . .. m a n y d a n g e r s t oA m a m m a l s . A n i m a l a n dA b e d s t o o s a l t y t o s u p p o r tA h e a v y s e a s , d a m a g e , a n dA? . .. v i n y l c h l o r i d e m o n o m e r? . .. m o l e c u l e s f o u n d i n? . .. N i s s a n c a r a n d truck? . .. a n d G o l g i a p p a r a t u s o f? . .. u n i o n r e s p o n s e s t o??? . .. c e l l t y p e s f o u n d i n t h e? . .. c o m p a n y s a i d t h e? . .. A l t h o u g h t h o u s a n d s o f? . .. a n i m a l r a t h e r t h a n? . .. c o m p u t e r d i s k d r i v e

    ( K e y w o r d i n C o n t e x t )plant l i f e f r o m t h e . ..plant l i f e . . . .plant l i f e a n d n a t u r a l . ..plant l i f e i n w a t e r . . .plant a n d a n i m a l l l f e . ..plant v i r u s l i f e c y c l e . ..plant a n d a n i m a l k i n g d o mplant a n d a n i m a l l i f e .. .plant l i f e a r e d e l i c a t e l yplant l i f e . R i v e r . ..plant l i f e g r o w i n g o n . ..plant, w h i c h i s . ..plant a n d a n i m a l t i s s u eplant i n J a p a n i s . ..plant a n d a n i m a l c e l i a .. .plant c l o s u r e s . . . .

    plant k i n g d o m a r e .. .plant i s s t i l l o p e r a t i n g . ..plant a n d a n i m a l s p e c i e splant t i s s u e s c a n b e . . .plant l o c a t e d i n . ..S . . . . . . . .B a u t o m a t e d m a n u f a c t u r i n g plant i n F r e m o n t . . .B . .. v a s t m a n u f a c t u r i n g plant a n d d i s t ri b u t i o n . ..B c h e m i c a l m a n u f a c t u r i n g plant, p r o d u c i n g v i s c o s eB . .. k e e p a m a n u f a c t u r i n g plant p r o f i t a b l e w i t h o u tB c o m p u t e r m a n u f a c t u r i n g plant a n d a d j a c e n t . ..B d i s c o v e r e d a t a S t . L o u i s plant m a n u f a c t u r i n gB . .. c o p p e r m a n u f a c t u r i n g plant f o u n d t h a t t h e yB c o p p e r w i r e m a n u f a c t u r i n g plant, f o r e x a m p l e . ..B ' s c e m e n t m a n u f a c t u r i n g plant i n A l p e n a . ..B p o l y s t y r e n e m a n u f a c t u r i n g plant a t i t s D e w . . .B c o m p a n y m a n u f a c t u r i n g plant i s i n O r l a n d o . ..I t i s u s e f u l t o v i s u a l i z e t h e p r o c e s s o f s e ed d e -v e l o p m e n t g r a p h i c a l l y . T h e f o l l o w i n g f i g u r e i l lu s -t r a t e s t h i s s a m p l e i n i t i a l s t a t e . C i r c l e d re g i o n s a r et h e t r a i n i n g e x a m p l e s t h a t c o n t a i n e i th e r a n A o r Bs e ed c o l l o ca t e . T h e b u l k o f t h e s a m p l e p o i n t s " ? "c o n s t i t u t e t h e u n t a g g e d r e s i d u a l .SFor the pu rposes of exposi t ion, I wil l assume a b inarysense part i t ion. I t is s t ra ightforward to extend this to ksenses usin g k sets o f seeds.

    ? _? ? _ ? 7 ? ? "t ?z .71 ? ??? ? , ? t?? ?? ? ?? ???7 ?? A A AAA ? ? 7 ?? ?7 77? ? ? ~ AAA AA A A A ?? ? ? ? ?? ? ?? ?A A A A A A A A A A A ?? ? ??? ?? A AA A A A AA ? ? 7 ?~ A ~ ? ? 77 ?? 7 77 ?? ? ?? ?? ?

    7 ? 7? ?-- ? ? ???? ??? ? ? ?? ? ? ?? ? ??77? ? ?? ?? ?? ? ?? ? ??7?? ? ?? ? ? ? ?? ? ? ? ? ?? 77 ???? ??? ? ? ? 77 ??? ?? ? ? ?~ 4 ?? ? ?7 ?? 77 77 ? ? 777 7 ?? ? ?~ 7 ? 7 77 ?? 77, , , 7 , , , 7 7 7 , , 7 , ~ : ; ~ 7 7 7 7 7 7 7 7-7717 ?77?7 7777 77777 ? 77 97 77 77 ??r 77 7 77 77 77 7 ?7 7 7?77 7 77 ? 77~ . , _ : . . : . f f . ? . ; : . 7 . : . : : . : . : . ~ . . , . . . . . . 7 . . . . . : . . . : . , . ; .

    7 7 7 7 7 7 7 7 ~ ~ 7 7 7~ 7 7 ~ 7 7 7 7 7 7 7 7 ~ ? ~ 7 7 7 7 ~ 7 77 7 7 7 7 ~ 7 7 7 7 7 7 7 7 7 ~77 7 , 7 77 ,7 7v 7 7 ,7 7 77 77 , ,, ? 77 7

    ' 7 7 , ~ , '7 ' 7 7 7 , 7 7 7 , , 7 7 7 7 ,7 7 , 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 , ,? ? 77 ? 7 ? ?? ? 7777 77 7 7777 7 7 77 7 7 77 ? ?7

    I 7 ~ 7 v ~ ~ 7 ~ vI ~ ? 7 ' ~ 7 7 7 7 ? ? ? 7 7 7 7 ~,I ? " 7 7 7 7 7 , ~ 7 " ? 7 7 7 77 77 7 ~ , ? - 'I 7 ~ ' ? 7 7 7 7 : , ? 7 7 7 7 7 7 : , 7 7 7 7 7 7 ? 7 ?777 ? 7 7 7 7 77? ? 77 7 77 7 77 77 ?7 7 77 7 7 ? ? 77 7 ? 7 7 7 1~ 77 ? 7 ?I ' 7 7 , ' ? 7 . 7 7 ~ , i, ~ o, .o , ~, '. ~ : . - - ~ 7 ~ ~7.I ? ? 7~ ?" 77 I , Sl ? ?? ?My ? 7? ? 7 ?? 777? 7 t777 77? ? 77 7 ?7 ?7 ? ? ~

    F i g u r e 1 : S a m p l e I n i t i a l S t a t eA = S E NS E- A t r a i n i n g e x a m p l eB = S EN S E- B t r a i n i n g e x a m p l e. ~ u r r e n t l y u n cl a ss if ie d t r a in i n g e x a m p l e[ L i fe ] = S e t o f t r a i n i n g e x a m p l e s c o n t a i n i n g t h ec o l l o c a t i o n " l i f e " .

    S T E P 3 a :T r a i n t h e s u p e r v i se d c l a ss i fi c a ti o n a l g o r i t h m o nthe S ENS E-A/S ENS E-B seed s e t s . Th e dec i s ion- l i s t a l -g o r i t h m u s e d h e r e ( Y a r o w s k y , 1 9 9 4) i d e n t i f i e s o t h e rc o l l o ca t i o n s t h a t r e l i a b ly p a r t i t i o n t h e s e e d t r a i n i n gd a t a , r a n k e d b y t h e p u r i t y o f t h e d i s t r i b u t i o n . B e -l o w i s a n a b b r e v i a t e d e x a m p l e o f t h e d e c i s i o n l i s tt r a i n e d o n t h e plant s e ed d a t a . 9I n i t i a l d e c i s i o n l is t f o r p l a n t ( a b b r e v i a t e d )L o g L8.107.587.397.206.274.704.394.304.103.523.483.45

    C o l l o c a t i o n S e n s eplant l i f e = ~ Am a n u f a c t u r i n g plant ~ Bl i f e ( w i t h i n 4 -2 -1 0 w o r d s ) ~ Am a n u f a c t u r i n g ( in 4-2 -1 0 w o r ds ) = ~ Ba n i m a l ( w i t h i n - I- 2- 10 w o r d s ) = ~ Aeq uip m ent (wi th in -1-2-10 wo rds ) =, Be m p l o y e e ( w i t h i n 4 -2 -1 0 w o r d s ) = ~ Ba s s e m b l y plant ~ Bplant closu re =~ Bplant specie s =~ Aa u t o m a t e ( w i t h i n 4 -2 -1 0 w o r d s ) :: ~ Bm i c r o s c o p i c plant ~ A

    9Note that a given col locate such as life m a y a p p e a rmultiple times in the list in different collocations1 re-la t ionships , including left -adjacent , r ight-adjacent , co-occur rence a t o the r pos i tions in a +k-w ord window andvarious other s yntac t ic associa t ions . Different pos i t ionsoften yie ld subs tant ia l ly d ifferent l ikel ihood ra t ios a nd incases such as pesticide plant vs . plant pesticide indicateentire ly dif ferent classifications.

    1 9 1

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    4/8

    STEP 3b:Apply the resulting classifier to the entire sam-ple set. Take those members in the residual thatare tagged as SENSE-A or SENSE-B with proba-bility above a certain threshold, and add thoseexamples to the growing seed sets. Using thedecision-list algorithm, these additions will containnewly-learned collocations that are reliably indica-tive of the previously-tra ined seed sets. The acquisi-tion of additional partitioning collocations from co-occurrence with previously-identified ones is illus-trated in the lower portion of Figure 2.

    STEP 3c:Optionally, the one-sense-per-discourse constraintis then used both to filter and augment this addition.The details of this process are discussed in Section 7.In brief, if several instances of the polysemous wordin a discourse have already been assigned S E N S E - A ,this sense tag may be extended to all examples inthe discourse, conditional on the relative numbersand the probabilities associated with the tagged ex-amples.

    L a b e l i n g p r e v i o u s l y u n t a g g e d c o n t e x t su s i n g t h e o n e - s e n s e - p e r - d i s c o u r s e p r o p e r t yC h a n g e D i s c .i n t ag Nu m b .~. -~ A 72 4

    A --* A 72 4? --* A 72 4A --* A 34 8A --* A 34 8? --* A i 34 8? --* A 34 8

    T r a i n i n g E x a m p l e s ( f r o m s a m e d i s c o u r s e ).. . t h e e x i s t e n c e o f p l a n t a n d a n i m a l l i f e . . .

    . . . c l a s s i f i e d a s e i t h e r p l a n t o r a n i m a l . ..A l t h o u l ~ h b a c t e r i a l a n d plant cells a r e e n c l o s e d

    . . . t h e l i f e o f t h e p l a n t , p r o d u c i n g s t e m. .. a n a s p e c t o f plant l i f e , f o r e x a m p l e

    . . . t i s s u e s ; b e c a u s e p lan t egg c e l l s h a v ep h o t o s y n t h e s i s , a n d s o plant g r o w t h i s a t t u n e d

    This augment ation of the training dat a can oftenform a bridge to new collocations that may not oth-erwise co-occur in the same nearby context with pre-viously identified collocations. Such a bridge to theSENSE-A collocate "cell" is illustrated graphically inthe upper half of Figure 2.

    Similarly, the one-sense-per-discourse constraintmay also be used to correct erroneously labeled ex-amples. For example:E r r o r C o r r e c t i o n u s i n g t h e o n e - s e n s e - p e r - d i s c o u r s e p r o p e r t yC h a n g e D i s c .

    i n t a g N u m b .A - --* A 52 5A - --* A 52 5A - --* A 52 5B ~ A 5 2 5

    " l ~r a in i ng E x a m p l e s ( f r o m s a m e d i s c o u r s e )c o n t a i n s a v a r i e d plant a n d a n i m a l l if e

    t h e m o s t c o m m o n plant l i f e , t h e . . .s l i g h t w i t h i n A r c t i c plant s p e c i e s . . .a r e p r o t e c t e d b y plant p a r t s re m a i n i n g f r o m

    ? ? L / re - A ' " a " A A . ' ? ~ ? ? 7 7 ? ? ' ' ? ?? ?? I rX ' li '~ A . ' " . ^ ^ ~ A t 2 2 ~ f ~ - - P . , , ~ : ~ ' l M l ~ o ~ w ~ o p i c I 1 '?' ~ ? ? , ? ? ' ? ; : ,

    ? ? ? ? ? ? ? ? ? ?

    , ^ ~ - * ~ ' . , /2 " ~ A = , I ~ : ' - ; , , , , , , , ,L ~ I I I ? 3 ? ? ? ? 2 ' ? ? ?? ? ? ' t " " ? " ? ? ? ? ? ?? ~ 7 7 ? ' t ? ? ~ ? - 7 7 7 ? ? ? ? ? 7 7 ? ? 7 ?

    7 7 ? ? r e ? ? ? ? ? ? ? ? ? ? 7 7 ? , 7 ? ? ? ? ? ? ?: . : . : . , . ' . . . . . . . : . . . . : . . . . . . ' . :? ? ? ? 2 ? ? ? ? ?? 2 7 ? ? ? ? ? ? ? 2 7 ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ??

    ? ? 7 " 7 7 1 E o u i m r ~ n t I . - [ a ~ l ~ B u % ~ . ~ i , . l L . ~ B - n ~ |. ; , ? - ? ~ ? ? ? ? ? ? ? '~ . ~ . f : ' l ~ , , ,~ / m , , = D B ~ h l ? ? ' ~ B ~ - . I I

    ? ?? ? ? ? ?? ? ? ' 7 ' ., ?

    Figure 2: Sample In te rmed ia te S ta te(following Steps 3b and 3c)

    STEP 4:Stop. When the training paramet ers are held con-stant, the algorithm will converge on a stable resid-ual set.Note that most tr aining examples will exhibit mul-tiple collocations indicative of the same sense (as il-lustrat ed in Figure 3). The decision list algorithmresolves any conflicts by using only the single mostreliable piece of evidence, no t a combination of allmatchi ng collocations. This circumvents man y ofthe problemz associated with non-independent evi-

    dence sources.

    STEP 3d:Repeat Step 3 iteratively. The training sets (e.g.

    S E N S E - A seeds plus newly added examples) will tendto grow, while the residual will tend to shrink. Addi-tional details aimed at correcting and avoiding mis-classifications will be discussed in Section 6. Fi gu re 3: Sa mp le Fi na l St at e

    1 9 2

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    5/8

    STEP 5:The classification procedure learned from the finalsupervised tr aining step may now be applied to newdata, and used to annotate the original untaggedcorpus with sense tags and probabilities.An abbreviated sample of the final decision listfor plant is given below. Note that the original seedwords are no longer at the top of the list. They havebeen displaced by more broadly applicable colloca-tions that better parti tion th e newly learned classes.In cases where there are multiple seeds, it is evenpossible for an original seed for SENSE-A to becomean indicator for SENSE-B if the collocate is more com-patib le with this second class. Thus the noise intro-duced by a few irrelevant or misleading seed wordsis not fatal. It may be corrected if the majority ofthe seeds forms a coherent collocation space.Final decision list for plant (abbrev ia ted)LogL Collocation Sense10.12 plant growth :=~ A9.68 car (wi thin q-k words) =~ B

    9.64 plant height ~ A9.61 union (wi thin 4-k words) =~ B9.54 equipment (with in +k words) =, B9.51 assembly plant ~ B9.50 nuclear plan t =~ B9.31 flower (wi thin =t:k words) =~ A9.24 job (wi thin q-k words) =~ B9.03 fruit (wi thin :t:k words) =, A9.02 plant species =~ A

    When this decision list is applied to a new test sen-tence,... the loss of animal and plant species through

    extinction ...,the highest ranking collocation found in the targetcontext (species) is used to classify the example asSENSW-A (a living plant). If available, informationfrom other occurrences of "plant" in the discoursemay override this classification, as described in Sec-tion 7.5 O p t i o n s f o r T r a i n i n g S e e d sThe algorithm should begin with seed words thataccurately and productively distinguish the possiblesenses. Such seed words can be selected by any ofthe following strategies:

    Use words in dicti onary definit ionsExtract seed words from a dictionary's entry forthe targe t sense. This can be done auto mat i-cally, using words that occur with significantlygreater frequency in the entry relative to theentire dictionary. Words in the entry appearingin the most reliable collocational relationshipswith the target word are given the most weight,based on the criteria given in Yarowsky (1993).

    Use a single defining collocate for eachclassRemarkably good performance may be achievedby iden tify ing a single defining collocate for eachclass (e.g. bird and mach ine for the word crane),and using for seeds only those contexts contain-ing one of these words. WordNet (Miller, 1990)is an automatic source for such defining terms.Label salient corpus collocatesWords that co-occur with the target word inunusually great frequency, especially in certaincollocational relationships, will tend to be reli-able indicators of one of the ta rget word's senses(e.g. ]lock and bulldozer for "crane"). A humanjudge must decide which one, but this can bedone very quickly (typically under 2 minutes fora full list of 30-60 such words). Co-occurrenceanalysis selects collocates that span the spacewith minimal overlap, opt imizing the efforts ofthe human assistant. While not fully automatic,this approach yields rich and h ighly reliable seedsets with mi nimal work.

    6 E s c a p i n g f r o m I n i t i a lM i s c l a s s i f i c a t i o n sUnlike many previous bootstra pping approaches, thepresent algorithm can escape from initial misclassi-fication. Examples added to the the growing seedsets remain there only as long as the probability ofthe classification stays above the threshold. IIf theirclassification begins to waver because new exampleshave discredited the crucial collocate, they are re-turned to the residual and may later be classified dif-ferently. Thus contexts that are added to the wrongseed set because of a misleading word in a dictionarydefinition may be (and typically are) correctly re-classified as iterat ive train ing proceeds. The redun-dancy of language with respect to collocation makesthe process primarily self-correcting. However, cer-tain strong collocates may become entrenched as in-dicators for the wrong class. We discourage such be-havior in the training algorithm by two techniques:1) incrementally increasing the width of the contextwindow after intermediate convergence (which peri-odically adds new feature values to shake up the sys-tem) a nd 2) rand omly perturbing the class-inclusionthreshold, similar to simulated annealing.

    7 U s i n g t h e O n e - s e n s e - p e r - d i s c o u r s eP r o p e r t yThe algorithm performs well using only local col-locational information, treating each token of thetarget word independently. However, accuracy canbe improved by also exploiting the fact that all oc-currences of a word in the discourse are likely toexhibit the same sense. This property ma y be uti-lized in two places, either once at the end of Step

    1 9 3

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    6/8

    [ ( 1 ) I ( 2 )WordplantspacetankmotionbasspalmpoachaxesdutydrugsakecraneAVG

    ( 3) 1 ( 4 ) ( 5)%Samp. Major SupvsdSenses Size Sense Algrtmliv ing /fa cto ry 7538 53.1 97.7vol ume /ou ter 5745 50.7 93.9veh icle/co nta ine r 11420 58.2 97.1

    lega l/physical 11968 57.5 98.0fish/mu sic 1859 56.1 97.8tr ee /h an d 1572 74.9 96.5ste al/ boi l 585 84.6 97.1gri d/ too ls 1344 71.8 95.5tax /ob lig ati on 1280 50.0 93.7medi cine/ narco tic 1380 50.0 93.0benef it/d rink 407 82.8 96.3bir d/m achi ne 2145 78.0 96.63936 63.9 96.1

    ( 6 ) 1 ( 7 )Seed Traini ngTwo Dict.Words Defn.97.1 97.389.1 92.394.2 94.6

    93.5 97.496.6 97.293.9 94.796.6 97.294.0 94.390.4 92.190.4 91.459.6 95.892.3 93.690.6 94.8

    I ( 8 ) ( 9 ) 1 ( 1 ) I I ( 1 1 )Options (7) + OSPDTop End Each SchiitzeColls. only Iter. Algrthm97.6 98.3 98.6 9293.5 93.3 93.6 9095.8 96.1 96.5 95

    97.4 97.8 97.9 9297.7 98.5 98.895.8 95.5 95.9 -97.7 98.4 98.5 -94.7 96.8 97.0 -93.2 93.9 94.1 -92.6 93.3 93.9 -96.1 96.1 97.5 -94.2 95.4 95.595.5 96.1 96.5 92.2

    4 after the algorithm has converged, or in Step 3cafter each iteration.At the end of Step 4, this property is used forerror correction. When a polysemous word such asplant occurs multiple times in a discourse, tokensthat were tagged by the algorithm with low con-fidence using local collocation information may beoverridden by the dominant tag for the discourse.The probab ility differentials necessary for such a re-classification were determined empir ically in an earlypilot study. The variables in this decision are the to-tal number of occurrences of plant in the discourse(n), the num ber of occurrences assigned to the ma-jority and minor senses for the discourse, and thecumulative scores for both (a sum of log-likelihoodratios). If cumulative evidence for the majo rit y senseexceeds that of the minority by a threshold (condi-tional on n), the min orit y cases are relabeled. Thecase n = 2 does not ad mi t much reclassification be-cause it is unclear which sense is dominant. But forn > 4, all but the most confident local classificationstend to be overridden by the dominant tag, becauseof the overwhelming strength of the one-sense-per-discourse tendency.The use of this property after each iteration issimilar to the final post-hoe application, but helpsprevent i nitially mistagged collocates from gaining afoothold. The major difference is that in discourseswhere there is substantial disagreement concerningwhich is the dominant sense, al l instances in thediscourse are returned to the residual rather thanmerely leaving their current tags unchanged. Thishelps improve the purity of the training data.The fundamental limitation of this property iscoverage. As noted in Section 2, half of the exam-ples occur in a discourse where there are no otherinstances of the same word to provide corroboratingevidence for a sense or to protect against misclas-sification. There is additional hope for these cases,

    however, as such isolated tokens tend to strong ly fa-vor a particular sense (the less "bursty" one). Wehave yet to use this additional information.8 Eva lua t ionThe words used in this evaluation were randomlyselected from those previously studied in the litera-ture. The y include words where sense differences arerealized as differences in French translation (drug--* drogue/m~dicament, and duty --~ devoir/droit),a verb (poach) and words used in Schiitze's 1992disambiguation experiments (tank, space, motion,plant) J The data were extracted from a 460 million wordcorpus containing news articles, scientific abstracts,spoken transcripts, and novels, and almost certainlyconstitute the largest training/testing sets used inthe sense-disambiguation literature.Columns 6-8 illustrate differences in seed trainingoptions. Using only two words as seeds does surpris-ingly well (90.6 %). This app roach is least success-ful for senses with a complex concept space, whichcannot be adequately represented by single words.Using the salient words of a dictionary definition asseeds increases the coverage of the concept space, im-proving accuracy (94.8%). However, spurious wordsin example sentences can be a source of noise. Quickhand tagging of a list of algorithmically-identifiedsalient collocates appears to be worth the effort, dueto the increa3ed accuracy (95.5%) and minimal cost.Columns 9 and 10 illustrate the effect of addingthe probabilistic one-sense-per-discourse constraintto collocation-based models using dictionary entriesas trai ning seeds. Column 9 shows its effectiveness

    1The number of words studied has been limited hereby the highly time-consuming constraint that full handtagging is necessary for direct comparison with super-vised training.

    194

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    7/8

    as a post-hoc constraint. Although apparently smallin absolute terms, on average this represents a 27%reduct ion in error rate. 11 When applied at each iter-ation, this process reduces the training noise, yield-ing the optimal observed accuracy in column 10.Compara t ive performance :Column 5 shows the relative performance of su-pervised training using the decision list algorithm,applied to the same data and not using any discourseinformation. Unsupervised training using the addi-tional one-sense-per-discourse constraint frequentlyexceeds this value. Column 11 shows the perfor-mance of Schiitze's unsupervised algorithm appliedto some of these words, t rained on a New York TimesNews Service corpus. Our a lgori thm exceeds this ac-curacy on each word, with an average relative per-formance of 97% vs. 92%. 1~9 C o m p a r i s o n w i t h P r e v i o u s W o r kThis algorithm exhibits a fundamental advantageover supervised learning algorithms (including Black(1988), Hearst (1991), Gale et al. (1992), Yarowsky(1993, 1994), Leacock et al. (1993), Bruce andWiebe (1994), and Lehma n (1994)), as it does not re-quire costly hand-tagged training sets. It thrives onraw, unanno tated monolingual corpora - the morethe merrier. Although there is some hope from usingaligned bilingual corpora as training data for super-vised algorithms (Brown et al., 1991), this approachsuffers from both the limited availability of such cor-pora, and the frequent failure of bilingual translationdifferences to model monolingual sense differences.The use of dictionary definitions as an optionalseed for the unsupervised algorithm stems from along history of dictionary-based approaches, includ-ing Lesk (1986), Guthrie et al. (1991), Veronis andIde (1990), and Slator (1991). Although these ear-lier approaches have used often sophisticated mea-sures of overlap with dictionary definitions, theyhave not realized the potential for combining the rel-atively limited seed informa tion in such definitionswith the nearly unlimited co-occurrence informationextractable from text corpora.Other unsupervised methods have shown greatpromise. Dagan and Itai (1994) have proposed amethod using co-occurrence statistics in indepen-dent monolingual corpora o f two languages to guidelexical choice in machine translat ion. Trans lationof a Hebrew verb-object pair such as l a h t o m (signor seal) and h. oz e (contract or treaty) is determinedusing the most probable combination of words inan English monol ingual corpus. This work shows

    11The maximum possible error rate reduction is 50.1%,or the mean applicability discussed in Section 2.12This difference is even more striking given thatSchiitze's data exhibit a higher baseline probability (65%vs. 55%) for these words, and hence constitute an easiertask.

    that leveraging bilingual lexicons and monolinguallanguage models can overcome the need for alignedbilingual corpora.Hearst (1991) proposed an early application ofbootstrapping to augment training sets for a su-pervised sense tagger . She tra ined her fully super-vised algorithm on hand-labelled sentences, appliedthe result to new data and added the most con-fidently tagged examples to the tra ining set. Re-grettably, this algorithm was only described in twosentences and was not developed further. Our cur-rent work differs by eliminating the need for hand-labelled training dat a entirely and by the joint use ofcollocation and discourse constraints to accomplishthis.Schiitze (1992) has pioneered work in the hier-archical clustering of word senses. In his disam-biguation experiments, Schiitze used post-hoc align-ment of clusters to word senses. Because the top-level cluster parti tions based purely on distributionalinformation do not necessarily align with standardsense distinctions, he generated up to 10 sense clus-ters and manua lly assigned each to a fixed sense label(based on the hand-inspect ion of 10-20 sentences percluster). In contrast, our algorithm uses automati-cally acquired seeds to tie the sense partitions to thedesired standard at the beginning , where it can bemost useful as an anchor and guide.In addition, Schiitze performs his classificationsby treating documents as a large unordered bag ofwords. By doing so he loses many important dis-tinctions, such as collocational distance, word se-quence and the existence of predicate-argument rela-tionships between words. In contrast, our algorithmmodels these properties carefully, adding consider-able discriminating power lost in other relatively im-poverished models of language.1 0 C o n c l u s i o nIn essence, our algorithm works by harnessing sev-eral powerful, empirically-observed properties of lan-guage, namely the strong t endency for words to ex-hibit only one sense per collocation and per dis-course. It att empt s to derive maximal leverage fromthese properties by modeling a rich diversity of collo-cational relationships. It thus uses more discriminat-ing information than available to algorithms treatingdocuments as bags of words, ignoring relative posi-tion and sequence. Indeed, one of the strengths ofthis work is that it is sensitive to a wider range oflanguage detail than typically captured in statisticalsense-disambiguation algorithms.Also, for an unsupervised algorithm it works sur-prisingly well, directly outperforming Schiitze's un-supervised algorithm 96.7 % to 92.2 %, on a testof the same 4 words. More impressively, it achievesnearly the same performance as the supervised al-gorithm given identical training contexts (95.5 %

    1 9 5

  • 7/28/2019 Unsupervised word sense disambiguation rivaling supervised.pdf

    8/8

    v s . 9 6 . 1 % ) , a n d i n s o m e c a s es a c t u a l l y a c h ie v e ss u p e r i o r p e r f o r m a n c e w h e n u s i n g t h e o n e - s en s e - p e r-d i s c o u r s e c o n s t r a i n t ( 9 6 .5 % v s . 9 6 . 1 % ) . T h i s w o u l di n d i c a t e t h a t t h e c o s t o f a l a r g e s e n s e - t a g g e d t r a i n -i n g c o r p u s m a y n o t b e n e c e s s a r y t o a c h i e v e a c c u r a t ew o r d - s e n se d i s a m b i g u a t i o n .A c k n o w l e d g e m e n t sT h i s w o r k w a s p a r t i a l ly s u p p o r t e d b y a n N D S E G F el -l o w sh ip , A R P A g ran t N 0 0 0 1 4 -9 0 - J -1 8 6 3 an d A R O g ran tD A A L 0 3 -89 -C 0 03 1 P R I . T h e au th o r i s a l so af f i li a t edw i t h t h e I n f o r m a t i o n P r i nc i p le s R e s e a r c h C e n t e r A T & TB ell L ab o ra to r i e s , an d g rea t l y ap p rec i a t e s t h e u se o f i t sr e so u rces i n su p p o r t o f t h i s w o rk . H e w o u ld li k e t o t h an kJ aso n E i sn e r , M i t ch M arcu s , M ark L ib e rm an , A l i so nM ack ey , D an M elam e d an d L y le U n g a r fo r t h e i r v a lu -a b l e c o m m e n t s .

    R e f e r e n c e sB a u m , L . E . , " A n I n e q u a l i t y a n d A s s o c i a t e d M a x i m i z a -

    t i o n T ech n iq u e in S t a t i s t i ca l E s t im a t io n o f P ro b ab i l i s -t i c F u n c t io n s o f a M ark o v P ro cess , " Inequal i t ies , v 3,pp 1-8, 1972.B l a ck , E z r a , " A n E x p e r i m e n t i n C o m p u t a t i o n a l D i s c r im -in a t i o n o f E n g l i sh W o rd S en ses ," i n I B M J o u r n a l o fR esearch and Deve l opment , v 232, pp 185-194, 1988.B r iU , E ri c , " A C o r p u s - B a s e d A p p r o a c h t o L a n g u a g eL earn in g , " P h . D . T h es i s , U n iv e r s i t y o f P en n sy lv an i a ,1993.B ro w n , P e t e r , S t ep h en D e l l a P i e t r a , V in cen t D e l l aP i e t r a , a n d R o b e r t M e r c e r, " W o r d Se n s e D is a m b i g u a -t i o n u s in g S t a t i s t i ca l M e th o d s , " Proceedings of the2 9 t h A n n u a l M e e t i n g o f t h e A s s o c i a t i o n f o r C o m p u -t a t i ona l L i ngu i s t i c s, pp 264-270, 1991.B r u c e , R e b e c c a a n d J a n y c e W i e b e , " W o r d -S e n s e D i sa m -b ig u a t io n U s in g D eco m p o sab le M o d e l s , " in Proceed-i ngs o f t h e 32nd Annual Mee t i ng o f t h e Assoc i a t i onfor C om put a t i ona l L ingu i s t i c s , L as C ru ces , N M , 1 9 9 4 .C h u r c h , K . W . , " A S t oc h a s t i c P a r t s P r o g r a m a n N o u nP h r a s e P a r s e r f o r U n r e s t r i c t e d T e x t , " i n Proceeding,IE E E In t e rna t i ona l C on f erence on Acous t i c s , Speechand Signal Process ing, Glasgow, 1989 .D ag an , Id o an d A lo n I t a i , "W o rd S en se D i sam b ig u a t io nU s in g a S eco n d L an g u ag e M o n o l in g u a l C o rp u s" , C o m -put a t i ona l L i ngu i s t ic s , v 20, pp 563-596, 1994.D em p s te r , A . P . , L a i rd , N . M , an d R u b in , D . B . , "M ax i -m u m L i k e l i h o o d F r o m I n c o m p l e t e D a t a v i a t h e E MA l g o r i t h m , " Journa l o f t h e R oya l S t a t i s t i ca l Soc i e t y ,v 39, pp 1-38, 1977.Ga le , W . , K . C h u rch , an d D . Y aro w sk y , "A M eth o dfo r D i sam b ig u a t in g W o rd S en ses i n a L a rg e C o rp u s , "C omput er s and t he Humani t i e s , 26, pp 415-439, 1992.Ga le , W . , K . C h u rch , an d D . Y aro w sk y . "D i sc r im in a -t i o n D ec i s io n s fo r 1 0 0 ,0 0 0 -D im en s io n a l S p aces . " In A .Z am p o l i , N . C a l zo l a r i an d M . P a lm er ( ed s . ) , C urren tI s sues i n C omput a t i ona l L i ngu i s t i c s : I n Honour o fD o n W a l k e r , Klu w er A cad em ic P u b l i sh e r s , p p . 4 2 9 -450, 1994.

    Gu th r i e , J . , L . Gu th r i e , Y . W i lk s an d H . A id ine j ad ," S u b j e c t D e p e n d e n t C o - o c c u r r e n c e a n d W o r d S e n s eD i sam b ig u a t io n , " i n Proceedings of the 29th AnnualMee t i ng o f t h e Assoc i a t i on f or C om put a t i ona l L i ngu i s -tics, pp 146-152, 1991.H e a r s t , M a r t i , " N o u n H o m o g r a p h D i s a m b i g u a t i o n U s -i n g L o c a l C o n t e x t i n L a r g e T e x t C o r p o r a , " i n UsingCorpora, U n iv e r s i t y o f W ate r lo o , W ate r lo o , O n ta r io ,1991.L eaco ck , C lau d ia , Geo f f r ey T o w el l an d E l l en V o o rh ees"C o rp u s -B ased S t a t i s t i ca l S en se R eso lu t i o n , " i n Pro-ceed i ngs , AR PA Human L anguage T echno l ogy Work -shop , 1993.L eh m an , J il l F a in , "T o w ard t h e E ssen t i a l N a t u re o f S t a -t i s t i ca l Kn o w led g e i n S en se R eso lu t i o n " , i n Proceed-i ngs o f t h e T we l f t h Nat i ona l C on f erence on Ar t i f i c i a lInte l l igence, pp 734-471, 1994.L esk , M ich ae l , "A u to m at i c S en se D i sam b ig u a t io n : H o wt o t e l l a P i n e C o n e f r o m a n I c e C r e a m C o n e , " P r o -ceeding of the 1986 SIGDOC Conference, A sso c i a t i o nfo r C o m p u t in g M ach in e ry , N ew Y o rk , 1 9 8 6 .M i ll er , Geo rg e , "W o rd N et : A n O n -L in e L ex ical

    D a t a b a s e , " Internat ional Journal of Lexicography, 3 ,4, 1990.M o s te l l e r , F red e r i ck , an d D av id W al l ace , In f e rence andDispu ted Au thor ship : The Federal is t, A d d i so n -W es ley ,R ead in g , M assach u se t t s , 1 9 6 4 .R iv es t , R . L . , "L ea rn in g D ec i s io n L i s t s , " i n M a c h i n eLearning, 2, pp 229-246, 1987.S ch i it ze , H in r ich , "D im en s io n s o f M ean in g , " i n Proceed-i ngs o f Supercomput i ng ' 92 , 1992.S l a to r , B r i an , "U s in g C o n tex t fo r S en se P re fe ren ce , " i nT ex t -Based In t e l li gen t Sys t ems: C urren t R esearch i nT ex t Ana l ys i s , I n format i on E x t rac t i on and R e t r i eva l ,P . S . J aco b s , ed . , GE R esea rch an d D ev e lo p m en t C en -t e r , S ch en ec t ad y , N ew Y o rk , 1 9 9 0 .V ero n i s , J ean an d N an cy Id e , "W o rd S en se D i sam -b i g u a ti o n w i t h V e r y L a r ge N e u r a l N e t w o r k s E x t r a c t e df ro m M ach in e R ead ab le D ic t i o n a r i e s ," i n Proceedings,C O L I N G - 9 0 , pp 389-394, 1990.Y aro w sk y , D av id "W o rd -S en se D i sam b ig u a t io n U s in gS t a t i s t i c a l M o d e l s o f R o g e t ' s C a t e g o r i e s T r a i n e d o nL a r g e C o r p o r a , " i n Proceedings , COLING-92, N a n t e s ,France, 1992 .Y ax o w sk y , D av id , "O n e S en se P e r C o l lo ca t i o n , " i n Pro-ceed i ngs , AR PA Human L anguage T echno l ogy Work -shop, P r in ce to n , 1 9 9 3 .Y aro w sk y , D av id , "D ec i s io n L i s t s fo r L ex ica l A m b ig u -i t y R eso lu t i o n : A p p l i ca t i o n t o A cce n t R es to ra t i o n i nS p an i sh an d F ren ch , " i n Proceedings of the 32nd An-

    nua l Mee t i ng o f t h e Assoc i a t i on .f or C omput a t i ona lL inguis t ics , Las C ru ces , N M , 1 9 9 4 .Y a r o w s ky , D a v i d . " H o m o g r a p h D i s a m b i g u a t i o n inS p eech S y n th es i s . " In J . H i r sch b e rg , R . S p ro a t an dJ . v an S an ten ( ed s . ) , Progress in Speech Synthesis ,S p r in g e r -V er l ag , t o ap p ea r .

    1 9 6