6
The Technical Feasibility of Translating Languages by Machine V . H . Y N G V R ECENT developments in the design of elec- tronic digital comput- ers have suggested the possi- bility of translating languages by machine. However, when one tries to write a good trans- lation program for a computer, one finds that there is much to be learned about language. For this reason, no program has yet been written by which a computer could translate usefully, but a number of people have been challenged by the problem of filling in the gaps in our knowledge about language and translation. 1 ' 2 This increasing interest in the possibility of mechanical translation is not surprising. Translation on a large scale is becoming more and more urgent in a rapidly shrinking world, still partitioned by many language barriers. There are at least 50 languages that possess written records worth translating ; at least, 200 languages into which the scientific, technical, and cultural records of our civilization might be translated. The price that each language community pays for the luxury of expressing itself in its own language is the cost and the difficulty of communication with the rest of the world. Today, communication between language communities must be funneled through individuals who are to some extent bilingual. If special-purpose machines can be devised for translating written material, the cost of translation can be significantly reduced. Language com- munities could then retain their individuality and still gain the advantages of increased communication. This article will explore two types of mechanical trans- lation. The first, which can be called word-for-word trans- lation, is within our immediate reach, but promises results that are crude, at best. The second type, sentence-for- sentence translation, which is the type being studied at the Research Laboratory of Electronics, Massachusetts Institute of Technology (MIT), Cambridge, Mass., promises trans- lation of a higher quality, but will require a great deal of effort to work out the linguistic details. WORD-FOR-WORD TRANSLATION WORD-FOR-WORD TRANSLATION is based on the use of a dictionary. The words of the input text are looked up in order, one by one, in a dictionary, and the meanings are written down in order, one by one. In determining the meanings, only one text word at a time is considered. Word-for-word translations can be produced by machine in the following way : The material to be translated is typed on a keyboard device, which transfers the characters to some input medium such as punched tape, punched cards, or Language translations now can be implemented by general-purpose digital computers or, more economically, by using special-purpose ma- chines. Either word-by-word or more accurate sentence-by-sentence translations are consider- ably faster and cheaper than man-made ones. magnetic tape for entry into the machine. The machine then compares the words, one at a time, with the dictionary entries that are stored in the memory. When each input word is located in the mem- ory, the dictionary entry stored there provides a meaning that consists of an equivalent word or words in the other language. The equivalents found in the dictionary are then sent one at a time to an output printer that prints a word-for-word translation. If a machine that can produce rough translations on a production basis is wanted as soon as possible, the word-for- word machine is the favored type because it is simple and can be implemented now. If translations better than word- for-word translations are wanted, work on an automatic dictionary should still be undertaken. Any machine that translates will need a dictionary, and the technical problems arising in the design of an automatic dictionary are not trivial. There are good reasons for taking the word, delimited by spaces or punctuation, as the item to be looked up, in- stead of some smaller unit. It has been suggested that if the roots of words and their inflectional endings were stored in the memory separately, much space would be saved in the dictionary. If a given word can be inflected in five ways, the number of entries in the dictionary could, perhaps, be reduced by nearly four fifths, since the inflectional endings are common to many words. As another space-saving device, it has been suggested that compound words, such as blackboard, counterelectromotive, and Koordinatenanfangs- punkt, could be separated into their component parts, and each part looked up separately. Suggestions such as these for splitting up words have at least two disadvantages. One is that the separation of a word into parts before it is looked up requires a method for deciding where to make the separation. Part of the storage space saved has to be used for a more complicated program or look-up routine. An- other disadvantage is that parts of words and inflectional endings have, in general, more different meanings than fully inflected words. Strict word-for-word translation com- pletely avoids the segmentation problems and avoids some Full text of paper 56-928, "The Technical Feasibility of Translating Languages by Machine," recommended by the AIEE Committee on Communication Theory and approved by the AIEE Committee on Technical Operations for presentation at the AIEE Fall General Meeting, Chicago, 111., October 1-5, 1956. V. H. Yngve is with Massachusetts Institute of Technology, Cambridge, Mass. The work on which the paper is based was supported in part by the U. S. Army (Signal Corps), the U. S. Air Force (Office of Scientific Research, Air Research and Develop- ment Command), the U. S. Navy (Office of Naval Research), and in part by the National Science Foundation. 994 Yngve—Translating by Machine ELECTRICAL ENGINEERING

The technical feasibility of translating languages by machine

  • Upload
    v-h

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The technical feasibility of translating languages by machine

The Technical Feasibility of Translating Languages by Machine

V. H . Y N G V Ε

R E C E N T d e v e l o p m e n t s in t h e des ign of elec­t ron ic d ig i ta l c o m p u t ­

ers h a v e suggested t h e possi­bi l i ty of t r ans l a t i ng l a n g u a g e s b y m a c h i n e . H o w e v e r , w h e n one tries to wr i t e a good t r ans ­la t ion p r o g r a m for a c o m p u t e r , one finds t h a t t he r e is m u c h to be l e a r n e d a b o u t l a n g u a g e . F o r this reason , n o p r o g r a m has yet b e e n w r i t t e n b y w h i c h a c o m p u t e r cou ld t r ans l a t e usefully, b u t a n u m b e r of p e o p l e h a v e b e e n cha l l enged b y t h e p r o b l e m of filling in t h e gaps in o u r k n o w l e d g e a b o u t l a n g u a g e a n d t r a n s l a t i o n . 1 ' 2

T h i s inc reas ing in te res t in t h e possibi l i ty of m e c h a n i c a l t r ans l a t ion is n o t surpr i s ing . T r a n s l a t i o n on a l a rge scale is b e c o m i n g m o r e a n d m o r e u r g e n t in a r a p i d l y s h r i n k i n g wor ld , still p a r t i t i o n e d b y m a n y l a n g u a g e ba r r i e r s . T h e r e a r e a t least 50 l a n g u a g e s t h a t possess w r i t t e n r eco rds w o r t h t r ans l a t i ng ; a t least, 200 l a n g u a g e s i n to w h i c h t h e scientific, t echn ica l , a n d c u l t u r a l r eco rds of o u r c iv i l iza t ion m i g h t be t r ans l a t ed . T h e p r i ce t h a t e a c h l a n g u a g e c o m m u n i t y pays for t h e l u x u r y of express ing itself in its o w n l a n g u a g e is t h e cost a n d t h e difficulty of c o m m u n i c a t i o n w i t h t h e rest of t h e w o r l d . T o d a y , c o m m u n i c a t i o n b e t w e e n l a n g u a g e c o m m u n i t i e s m u s t b e funne led t h r o u g h ind iv idua l s w h o a r e to some ex t en t b i l ingua l . If spec ia l -purpose m a c h i n e s c a n be devised for t r a n s l a t i n g w r i t t e n m a t e r i a l , t h e cost of t r ans la t ion c a n b e s ignif icant ly r e d u c e d . L a n g u a g e c o m ­mun i t i e s cou ld t h e n r e t a i n the i r i nd iv idua l i t y a n d still ga in t he a d v a n t a g e s of inc reased c o m m u n i c a t i o n .

T h i s a r t ic le will exp lo re t w o types of m e c h a n i c a l t r a n s ­la t ion . T h e first, w h i c h c a n be cal led word - fo r -word t r a n s ­la t ion, is w i t h i n o u r i m m e d i a t e r e a c h , b u t p romises resul ts t h a t a r e c r u d e , a t best . T h e second type , sentence-for-sen tence t r ans l a t ion , w h i c h is t h e t ype b e i n g s tud ied a t t h e R e s e a r c h L a b o r a t o r y of E lec t ron ics , M a s s a c h u s e t t s I n s t i t u t e of T e c h n o l o g y ( M I T ) , C a m b r i d g e , Mass . , p romises t r a n s ­la t ion of a h i g h e r qua l i t y , b u t will r e q u i r e a g r e a t d e a l of effort to work o u t t h e l inguis t ic deta i ls .

WORD-FOR-WORD TRANSLATION

W O R D - F O R - W O R D TRANSLATION is based o n t h e use of a

d ic t i ona ry . T h e w o r d s of t h e i n p u t tex t a r e looked u p in o rde r , o n e by one , in a d i c t i ona ry , a n d t h e m e a n i n g s a r e w r i t t e n d o w n in o rde r , o n e b y one . I n d e t e r m i n i n g t h e m e a n i n g s , on ly o n e tex t w o r d a t a t i m e is cons ide red .

W o r d - f o r - w o r d t r ans l a t ions c a n be p r o d u c e d b y m a c h i n e in t he fol lowing w a y : T h e m a t e r i a l to be t r a n s l a t e d is t y p e d on a k e y b o a r d dev ice , w h i c h transfers t h e c h a r a c t e r s to s o m e i n p u t m e d i u m such as p u n c h e d t a p e , p u n c h e d ca rds , or

L a n g u a g e trans lat ions n o w can b e i m p l e m e n t e d b y g e n e r a l - p u r p o s e dig i ta l c o m p u t e r s or , m o r e economica l ly , b y u s i n g s p e c i a l - p u r p o s e m a ­c h i n e s . E i t h e r w o r d - b y - w o r d or m o r e accura te s e n t e n c e - b y - s e n t e n c e trans lat ions are c o n s i d e r ­a b l y faster a n d c h e a p e r t h a n m a n - m a d e o n e s .

m a g n e t i c t a p e for e n t r y in to t h e m a c h i n e . T h e m a c h i n e t h e n c o m p a r e s t h e words , one a t a t ime , w i t h t h e d i c t i o n a r y en t r ies t h a t a r e s tored in t h e m e m o r y . W h e n e a c h i n p u t w o r d is loca ted in t h e m e m ­ory , t he d i c t i o n a r y e n t r y s tored

t h e r e p rov ides a m e a n i n g t h a t consists of a n e q u i v a l e n t w o r d or w o r d s in t h e o t h e r l a n g u a g e . T h e equ iva l en t s found in t h e d i c t i o n a r y a r e t h e n sent o n e a t a t i m e to a n o u t p u t p r i n t e r t h a t p r i n t s a word - fo r -word t r ans la t ion .

If a m a c h i n e t h a t c a n p r o d u c e r o u g h t r ans la t ions on a p r o d u c t i o n basis is w a n t e d as soon as possible, t h e word-for -w o r d m a c h i n e is t he favored t y p e because it is s imple a n d c a n b e i m p l e m e n t e d n o w . If t r ans la t ions b e t t e r t h a n w o r d -for -word t r ans l a t ions a r e w a n t e d , w o r k on a n a u t o m a t i c d i c t i o n a r y shou ld still b e u n d e r t a k e n . A n y m a c h i n e t h a t t r ans la tes will n e e d a d i c t i ona ry , a n d t h e t echn ica l p r o b l e m s ar i s ing in t h e des ign of a n a u t o m a t i c d i c t i o n a r y a r e n o t t r iv ia l .

T h e r e a r e good reasons for t a k i n g t h e w o r d , de l im i t ed b y spaces or p u n c t u a t i o n , as t h e i t em to b e looked u p , in­s t ead of some smal le r u n i t . I t ha s b e e n suggested t h a t if t h e roo ts of w o r d s a n d the i r inf lect ional end ings w e r e s tored in t h e m e m o r y sepa ra te ly , m u c h space w o u l d be saved in t h e d i c t i o n a r y . If a g iven w o r d c a n b e inflected in five ways , t h e n u m b e r of en t r ies in t h e d i c t i o n a r y cou ld , p e r h a p s , b e r e d u c e d b y n e a r l y four fifths, s ince t h e inflect ional e n d i n g s a r e c o m m o n to m a n y w o r d s . As a n o t h e r space-sav ing dev ice , i t h a s b e e n sugges ted t h a t c o m p o u n d words , such as b l a c k b o a r d , c o u n t e r e l e c t r o m o t i v e , a n d K o o r d i n a t e n a n f a n g s -p u n k t , co u ld b e s e p a r a t e d i n t o the i r c o m p o n e n t pa r t s , a n d e a c h p a r t looked u p s epa ra t e ly . Sugges t ions such as these for sp l i t t ing u p w o r d s h a v e a t least t w o d i s a d v a n t a g e s . O n e is t h a t t h e s e p a r a t i o n of a w o r d i n t o p a r t s before it is looked u p r equ i r e s a m e t h o d for d e c i d i n g w h e r e to m a k e the s e p a r a t i o n . P a r t of t h e s to rage space saved has to b e used for a m o r e c o m p l i c a t e d p r o g r a m or l ook -up r o u t i n e . A n ­o t h e r d i s a d v a n t a g e is t h a t p a r t s of w o r d s a n d inflect ional e n d i n g s h a v e , in gene ra l , m o r e different m e a n i n g s t h a n fully inflected w o r d s . S t r ic t word - fo r -word t r ans l a t ion com­ple te ly avoids t h e s e g m e n t a t i o n p r o b l e m s a n d avoids some

Full text of paper 56-928, "The Technical Feasibility of Translating Languages by Machine," recommended by the AIEE Committee on Communication Theory and approved by the AIEE Committee on Technical Operations for presentation at the AIEE Fall General Meeting, Chicago, 111., October 1-5, 1956.

V. H. Yngve is with Massachusetts Institute of Technology, Cambridge, Mass.

The work on which the paper is based was supported in part by the U. S. Army (Signal Corps), the U. S. Air Force (Office of Scientific Research, Air Research and Develop­ment Command), the U. S. Navy (Office of Naval Research), and in part by the National Science Foundation.

994 Yngve—Translating by Machine ELECTRICAL ENGINEERING

Page 2: The technical feasibility of translating languages by machine

of t h e m u l t i p l e m e a n i n g s of p a r t s of words—inf l ec t iona l end ings , in p a r t i c u l a r .

I n t h e n e x t sect ions, w o r d - f o r - w o r d t r a n s l a t i o n will b e cons idered , i n de t a i l . T h e r e q u i r e m e n t s of t h e task will b e c o m p a r e d w i t h t h e capab i l i t i e s of p r e s e n t - d a y g e n e r a l - p u r ­pose d ig i ta l c o m p u t e r s a n d of spec i a l -pu rpose m a c h i n e s t h a t m i g h t b e c o n s t r u c t e d for t r a n s l a t i o n ; t h e n t h e usefulness of t h e p r o d u c t will b e assessed.

MACHINE REQUIREMENTS—SIZE

T H E SIZE OF THE MEMORY of a m a c h i n e is of i m p o r t a n c e

in word - fo r -word t r ans l a t ions . M e m o r y size d e p e n d s u p o n t h e n u m b e r of d i c t i o n a r y en t r ies , t h e a v e r a g e l e n g t h of e a c h en t ry , a n d , in s t o r e d - p r o g r a m m a c h i n e s , o n t h e s t o r a g e r e q u i r e m e n t s of t h e p r o g r a m .

T h e n u m b e r of w o r d s in a l a n g u a g e is h a r d to e s t ima te . If t h e ent r ies in d ic t iona r i e s a r e c o u n t e d , n u m b e r s a r o u n d a mi l l ion a r e o b t a i n e d . H o w e v e r , n o d i c t i o n a r y c o n t a i n s all t h e w o r d s of a l a n g u a g e ; n e w w o r d s a r e b e i n g a d d e d every d a y . T h e v o c a b u l a r y of a l a n g u a g e is n o t a closed sys tem; it is a lways o p e n to a d d i t i o n s . If t r a n s l a t i n g is l imi ted to o n e field of k n o w l e d g e (e lectr ical eng inee r ing , for e x a m p l e ) , mos t of t h e w o r d s b e l o n g i n g to o t h e r fields m a y never a p p e a r , a n d a glossary c o n t a i n i n g e lec t r ica l eng inee r ­ing t e r m s a n d ce r t a in g e n e r a l scientific t e r m s m i g h t b e a d e q u a t e . Bu t , l ike t h e v o c a b u l a r y of a n en t i r e l a n g u a g e , t he v o c a b u l a r y of a specific field is n o t c losed; a n d a n a u t h o r of a p a p e r in a specia l field often uses t e r m s f rom o the r fields.

A glossary shou ld c o n t a i n t h e w o r d s mos t f r equen t ly used in a specific field. S ince i t is v i r t ua l l y imposs ib le to list all t he w o r d s of a l a n g u a g e or even of a specific field, n e w w o r d s will h a v e to b e a d d e d f rom t i m e to t i m e as t h e y a r e n e e d e d . I t is feasible to s t a r t w i t h a r a t h e r smal l list of w o r d s a n d a d d n e w ones w h e n e v e r t h e y a r e first e n c o u n t e r e d o r after t h e y h a v e a p p e a r e d a g iven n u m b e r of t imes . I n this w a y , t h e glossary is a lways a n a p p r o x i m a t i o n to a list of t h e w o r d s of h ighes t f r equency in t h e specific field.

T h e m o s t i m p o r t a n t p a r a m e t e r to b e cons ide r ed in e v a l u a t i n g a glossary is n o t t h e n u m b e r of w o r d s t h a t i t con ta ins , b u t some p a r a m e t e r c o n n e c t e d w i t h t h e n u m b e r of r u n n i n g w o r d s of a tex t t h a t c a n b e found in t h e glossary. T h e coverage of a glossary w i t h respec t to a ce r t a in tex t c a n b e def ined as t h e r a t i o of t h e n u m b e r of r u n n i n g w o r d s of t h e text t h a t c a n b e found in t h e glossary to o n e m o r e t h a n t h e n u m b e r of w o r d s t h a t c a n n o t b e found . I t is t h e a v e r a g e l e n g t h of u n i n t e r r u p t e d r u n s of tex t w o r d s t h a t c a n b e found in t he glossary. T h e c o v e r a g e of a n Eng l i sh glossary w i t h respect to a R u s s i a n a r t i c le is ze ro . T h e c o v e r a g e of a glossary c o n t a i n i n g t h e 50 or 60 m o s t f r equen t ly used Engl i sh w o r d s is a b o u t one , b e c a u s e these w o r d s m a k e u p a b o u t half of t h e r u n n i n g w o r d s in a text . T h e c o v e r a g e of a glossary w i t h respec t to a n a r t i c l e is n u m e r i c a l l y e q u a l to t h e n u m b e r of w o r d s in t h e a r t i c le , if t h e glossary c o n t a i n s t h e m all . I n o r d e r to h a v e a r e a s o n a b l y h i g h c o v e r a g e w i t h respec t to a field, such as e lec t r ica l eng inee r ing , a glossary shou ld c o n t a i n a t least 5,000 careful ly chosen w o r d s .

I n assessing t h e size of m e m o r y n e e d e d for a n a u t o m a t i c d i c t iona ry , it is also i m p o r t a n t to form s o m e e s t i m a t e of t h e l eng th of e a c h en t ry . T h e d i s t r i bu t i on of l eng ths of t h e

different w o r d s in a l a n g u a g e is cha rac t e r i s t i c of t h e l a n g u a g e . I n Engl i sh , t h e a v e r a g e w o r d l e n g t h is a b o u t seven o r e igh t l e t t e r s ; in G e r m a n , it is s o m e w h a t longer . G e r m a n has m a n y w o r d s w i t h m o r e t h a n 20 le t ters . I n a s m u c h as w o r d l eng ths a r e so v a r i a b l e a n d b e c a u s e e a c h e n t r y m a y c o n t a i n o n e o r m o r e w o r d s as equ iva l en t s , t h e r e will b e cons ide rab le v a r i a t i o n in e n t r y size. P e r h a p s t h e a v e r a g e size will b e s o m e t h i n g l ike 20 l e t t e r s ; t h e m a x i m u m size m a y r u n to m o r e t h a n 100.

T h e a m o u n t of s to rage n e e d e d for a m i n i m u m d i c t i o n a r y of 5,000 en t r ies w i t h a n a v e r a g e e n t r y size of 20 c h a r a c t e r s of 5 b i n a r y d igi ts e a c h is 500,000 b i n a r y digi ts .

N o w t h a t a n e s t i m a t e of t h e m e m o r y c a p a c i t y r e q u i r e ­m e n t s ha s b e e n m a d e , t h e a d e q u a c y of p r e s e n t - d a y m a c h i n e s c a n b e e x a m i n e d . M a n y g e n e r a l - p u r p o s e d ig i ta l c o m p u t e r s h a v e t h r e e k inds of m e m o r y : m a g n e t i c core , m a g n e t i c d r u m , a n d m a g n e t i c t a p e . A s t o r a g e c a p a c i t y of 500,000 b i n a r y d igi ts is, b y p r e s e n t s t a n d a r d s , l a r g e for a m a g ­ne t i c -co re m e m o r y . E v e n m a n y m a g n e t i c - d r u m m a c h i n e s d o n o t h a v e this c a p a c i t y . O n e is forced to t h e conc lu­sion t h a t m a g n e t i c - t a p e s to rage m u s t b e used w i t h mos t p r e s e n t - d a y g e n e r a l - p u r p o s e c o m p u t e r s if a n a d e q u a t e d i c t i o n a r y is to be s to red .

MACHINE REQUIREMENTS—SPEED AND ECONOMY

A L T H O U G H THE ABILITY of a m a c h i n e to m a k e word-for -w o r d t r ans l a t i ons is d e t e r m i n e d p r i m a r i l y b y t h e size of t h e glossary t h a t it c a n s tore , its usefulness for p r o d u c i n g t r a n s ­la t ions e c o n o m i c a l l y d e p e n d s o n a n u m b e r of factors . T h e s e i n c l u d e t h e cost of o p e r a t i n g t h e m a c h i n e , t h e access t i m e of t h e m e m o r y , t h e regis ter l eng th , t h e t y p e of c o m m a n d s a v a i l a b l e a n d the i r e x e c u t i o n t imes , a n d t h e w a y in w h i c h t h e p r o g r a m is w r i t t e n .

I n p r e s e n t - d a y m a c h i n e s t h e m e m o r i e s a r e o rgan i zed b y address . T h i s m e a n s t h a t t h e m e m o r y con ta in s a n u m b e r of p igeonho les o r m e m o r y loca t ions , e a c h o n e d e s i g n a t e d b y a n u m b e r . T h e m a c h i n e is a r r a n g e d so t h a t r e a d y access is o b t a i n e d to e a c h m e m o r y loca t ion w h e n it is re fer red to b y its address . T h e a v e r a g e t i m e t h a t it takes to refer to a g iven addres s a n d e x t r a c t t h e con t en t s of t he m e m o r y from t h a t pos i t ion is ca l led t h e access t ime . F o r co re m e m o r i e s t h e access t imes a r e v e r y shor t , in t h e n e i g h b o r h o o d of 10 to 20 mic roseconds . F o r d r u m a n d t a p e m e m o r i e s , t h e c o n c e p t of access t i m e is n o t q u i t e as a p p r o p r i a t e , b e c a u s e t h e a c t u a l t i m e r e q u i r e d to find s o m e t h i n g a t a g iven add re s s var ies g rea t ly , b e i n g d e p e n d e n t o n t h e t i m e it takes t h e d r u m or t a p e to m o v e to t h a t address . A v e r a g e access t imes for d r u m s a r e in t h e n e i g h b o r h o o d of 10 mi l l i seconds .

I n a d i c t i o n a r y l o o k - u p p r o g r a m , t h e access t i m e is n o t t h e w h o l e s tory in d e t e r m i n i n g h o w qu ick ly a g iven w o r d c a n b e f o u n d ; a l t h o u g h t h e w o r d is k n o w n , its add re s s is no t . T h e b i n a r y r e p r e s e n t a t i o n of t h e w o r d itself c a n n o t be used as t h e address , b e c a u s e t h e spel l ing of w o r d s is h igh ly r e d u n d a n t . F o r a 5 ,000-word m e m o r y , t h e address w o u l d n e e d to b e on ly 13 b i n a r y digi ts l o n g ; b u t t h e spel l ing of a w o r d m i g h t r e q u i r e well over 100 b i n a r y digi ts .

V a r i o u s p r o g r a m m i n g m e t h o d s h a v e b e e n devised for s e a r c h i n g a m e m o r y w h e n t h e addres s is u n k n o w n . Clear ly , it is n o t e c o n o m i c a l of t i m e to s t a r t a t t h e b e g i n n i n g a n d

N O V E M B E R 1 9 5 6 Yngve—Translating by Machine 9 9 5

Page 3: The technical feasibility of translating languages by machine

e x a m i n e e a c h en t ry , in t u r n , t h r o u g h t h e w h o l e d i c t i o n a r y un t i l t h e r i g h t e n t r y is found . A n i m p r o v e m e n t consists of us ing t h e t h u m b i n d e x idea , w h e r e b y t h e s ea rch is d i v i d e d i n t o t w o steps. T h e in i t ia l o n e or t w o le t te rs of t h e w o r d a r e found in t h e i n d e x p a r t of t h e d i c t i o n a r y . T h e i n d e x e n t r y gives t h e pos i t ion in t h e m a i n list a t w h i c h t h e s e a r c h shou ld beg in . I t is possible to e x t e n d t h e t h u m b - i n d e x idea a n d d iv ide t h e s e a r c h i n t o t h r e e o r m o r e s teps . If this p r o c e d u r e is ca r r i ed to t h e e x t r e m e , t h e b i furca t ion , o r 20-ques t ions , t e c h n i q u e is r e a c h e d . E a c h s tep of t h e s e a r c h chooses o n e o u t of t w o a p p r o x i m a t e l y e q u a l sect ions of t h e d i c t i o n a r y . If t h e divis ion is exac t ly i n t o t w o , e a c h t ime , o n e o u t of 2n en t r ies c a n b e found in η t r ia ls . A n o t h e r possibi l i ty is to t ry to o b t a i n f rom t h e spel l ing of t h e w o r d some funct ion t h a t specifies, as closely as possible, t h e add re s s of t h e w o r d ; surpr i s ing ly good resul ts c a n b e o b t a i n e d .

I t is also possible to t a k e a d v a n t a g e of t h e v o c a b u l a r y statistics. T h e p r o b a b i l i t y pT t h a t t h e r t h w o r d f rom t h e v o c a b u l a r y will a p p e a r n e x t in a t ex t is g iven b y t h e Zipf L a w 3 a s p T = P r - 1 , w h e r e r is t h e r a n k of t h e w o r d w h e n w o r d s a r e r a n k e d in o r d e r of dec rea s ing f r equency , a n d Ρ is a c o n s t a n t w i t h a v a l u e of a p p r o x i m a t e l y o n e t e n t h . A m o r e a c c u r a t e l a w has b e e n d e r i v e d b y M a n d e l b r o t . 4 I t is pr = Ρ ( r + m ) B , w h e r e Β is a c o n s t a n t in t h e n e i g h b o r ­h o o d of one , a n d m is a n o t h e r cons t an t . T h e c o n s t a n t s P , m, a n d Β a r e cha rac t e r i s t i c of di f ferent l a n g u a g e s a n d , to s o m e ex ten t , of different a u t h o r s a n d styles of d i scourse . A c c o r d i n g to these t w o laws, as few as 50 o r 60 different w o r d s m a k e u p a b o u t hal f of t h e to ta l w o r d s in a t ex t . I f these 50 or 60 w o r d s a r e p l a c e d in a co re m e m o r y w i t h effectively i n s t a n t a n e o u s access, t h e over-a l l access t i m e c a n b e r e d u c e d b y a fac tor of 2.

F o r a m e m o r y , such as m a g n e t i c t a p e , t h e access t i m e to a g iven e n t r y d e p e n d s on its pos i t ion a l o n g t h e t a p e in r e l a t i on to t h e r e a d i n g h a n d s . T h e mos t effective w a y to use a m a g n e t i c - t a p e m e m o r y is to look u p t h e w o r d s in ba t ches , n o t o n e a t a t i m e . E a c h b a t c h of w o r d s is a l p h a ­be t i zed in t h e fast m e m o r y of t h e m a c h i n e . All of t h e w o r d s in a b a t c h c a n b e looked u p in o n e pass of t h e t a p e , if t he w o r d s on t h e t a p e a r e also a l p h a b e t i z e d . T h e a v e r a g e r a t e of access c a n b e g rea t l y inc reased b y this m e t h o d a t t h e expense of some d e l a y in t r a n s l a t i o n a n d some s to rage a n d m a n i p u l a t i o n r e q u i r e m e n t s in a fast m e m o r y . T h e c o n c e p t of r a n d o m access t i m e is a lmos t mean ing le s s w h e n w o r d s a r e looked u p in b a t c h e s . T h e a p p r o p r i a t e p a r a m e t e r s of t h e system a r e t h e a v e r a g e r a t e of i n f o r m a t i o n flow t h r o u g h t h e m a c h i n e a n d t h e a v e r a g e de l ay .

P r e s e n t - d a y d ig i ta l c o m p u t e r s h a v e b e e n des igned to h a n d l e a r i t h m e t i c a t ve ry fast r a t e s . A d i c t i o n a r y s e a r c h p r o g r a m for o n e of these m a c h i n e s usua l ly involves a lot of a r i t h m e t i c — n o t b e c a u s e i t is r e q u i r e d b y t h e p r o b l e m , b u t because t h e m a c h i n e does it so wel l . A m o d e r n fast c o m p u t e r w o u l d n o t b e as e c o n o m i c a l as a m a c h i n e specifi­cal ly des igned for t r a n s l a t i o n . S u c h a m a c h i n e shou ld b e c a p a b l e of h a n d l i n g ent r ies of v a r i a b l e l e n g t h a n d shou ld b e des igned specifically for s e a r c h i n g t h e m e m o r y , w h e n t h e addres s is n o t k n o w n . I t s hou ld b e specia l ly des igned for m a t c h i n g sequences of b i n a r y digi ts , a n d d e t e r ­m i n i n g w h e t h e r t h e n u m e r i c a l v a l u e of o n e s e q u e n c e is g r e a t e r o r less t h a n t h a t of a n o t h e r .

M e m o r y r e sea r ch is p rogress ing r a p i d l y . I n a n o t h e r few years , h i g h - c a p a c i t y magne t i c -d i sk m e m o r i e s , specia l m a g ­n e t i c - t a p e or d r u m a r r a n g e m e n t s , o r va r ious types of p h o t o g r a p h i c m e m o r i e s will p r o b a b l y b e d e v e l o p e d to t h e p o i n t w h e r e t h e y will b e a v a i l a b l e for t r a n s l a t i n g m a c h i n e s . I t m a y n o t be necessary to resor t to word - sp l i t t i ng t ech­n i q u e s in o r d e r to s to re a h i g h - c o v e r a g e glossary in a m a ­c h i n e t h a t is e c o n o m i c a l to use .

T h e s imple word - fo r -word t r ans l a t i on schemes t h a t h a v e b e e n discussed h a v e t h e a d v a n t a g e t h a t t h e y c a n b e i m ­p l e m e n t e d n o w . I n o r d e r to d e c i d e w h e t h e r o r n o t the i r i m p l e m e n t a t i o n is w o r t h whi le , t h e p o t en t i a l usefulness of t h e t r ans l a t i ons shou ld b e cons ide red .

USEFULNESS OF T H E O U T P U T

I F TRANSLATION , f rom G e r m a n to Engl ish , for e x a m p l e , is cons ide red as t h e select ion of o n e o u t of all possible sequences of Eng l i sh w o r d s for e a c h G e r m a n sen tence , a word-for -w o r d t r a n s l a t i o n is a good first a p p r o x i m a t i o n . Engl i sh sen tences a v e r a g e a b o u t 20 w o r d s . A s s u m i n g a v o c a b u l a r y of 1 0 6 w o r d s , t h e r e a r e 1 0 1 0 0 possible sequences of this l e n g t h . A w o r d - f o r - w o r d t r a n s l a t i o n m a y s u p p l y a n a v e r a g e of a b o u t t w o m e a n i n g s for e a c h G e r m a n w o r d . If t h e word - fo r -word t r a n s l a t i o n is good , t h e cor rec t Eng l i sh t r a n s l a t i o n c a n b e found b y choos ing t h e cor rec t m e a n i n g for e a c h w o r d f rom those supp l i ed a n d , t h e n , choos ing t h e co r r ec t w o r d o r d e r . T h e in i t ia l cho ice of o n e o u t of 1 0 1 0 0 possible sequences has b e e n r e d u c e d to o n e o u t of 2 2 0 X 20 ! o r a b o u t o n e o u t of 1 0 2 4 . F o r e a c h s equence left to choose f rom, a p p r o x i m a t e l y 1 0 7 6 h a v e b e e n ru l ed ou t . I t is t h u s seen t h a t t h e w o r d - f o r - w o r d t r a n s l a t i o n h a s d o n e a H e r c u l e a n j o b , b u t t h e r e is still a lot of w o r k for t h e r e a d e r to d o before h e c a n r e a d t h e t r ans l a t i on . H e m u s t still choose o n e o u t of 1 0 2 4 possibil i t ies. T h i s h e p r e s u m a b l y does o n t h e basis of his k n o w l e d g e of Eng l i sh s en t ence s t r u c t u r e a n d o n t h e basis of his k n o w l e d g e of t h e subjec t m a t t e r . T h e p r o b l e m s of m u l t i p l e m e a n i n g a n d of w o r d o r d e r a r e t h e t w o diffi­cul t ies t h a t h e m u s t face.

T o m a k e t h e r e a d e r ' s task easier , va r ious suggest ions h a v e b e e n m a d e for t h e a r r a n g e m e n t of t h e m u l t i p l e choices on t h e p a g e . T h e y cou ld b e s t r u n g o u t in l inea r o r d e r , pe r ­h a p s w i t h p a r e n t h e s e s ; o r t h e m o s t l ikely m e a n i n g for e a c h w o r d cou ld b e p u t first o r w r i t t e n in bo ld t y p e ; o r the va r ious m e a n i n g s cou ld b e listed in c o l u m n s . A n o t h e r ques t i on c o n c e r n s t h e n u m b e r of m e a n i n g s t h a t shou ld b e p r o v i d e d for e a c h w o r d . P e r h a p s o n e b r o a d m e a n i n g w o u l d b e less confusing t h a n t w o or t h r e e n a r r o w e r m e a n ­ings . A n o t h e r sugges t ion for m a k i n g t h e r e a d e r ' s task easier is ca l led p a r t i a l t r a n s l a t i o n . P a r t i a l t r a n s l a t i o n is b a s e d o n t h e fact t h a t on ly a few h u n d r e d ar t ic les , p repos i ­t ions, con junc t ions , a n d a d v e r b s m a k e u p a l a rg e ma jo r i t y of t h e r u n n i n g w o r d s in a t ex t a n d t h a t these words , t o g e t h e r w i t h a few inf lect ional end ings , h a v e especial ly l ong m u l t i p l e - m e a n i n g lists. T h e sugges t ion is t h a t these few h u n d r e d i t ems b e g iven in the i r o r ig ina l l a n g u a g e in o r d e r n o t to confuse t h e r e a d e r w i t h l o n g lists. T h e r e a d e r , after a br ief i n t r o d u c t i o n to t h e g r a m m a r a n d sen tence s t r u c t u r e of t h e foreign l a n g u a g e , w o u l d b e a b l e to u n d e r s t a n d these w o r d s a n d e n d i n g s a n d t h e cons t ruc t ions in w h i c h they a r e used . All of t h e t e c h n i c a l t e r m i n o l o g y w o u l d b e t r ans l a t ed .

9 9 6 Yngve—Translating by Machine ELECTRICAL ENGINEERING

Page 4: The technical feasibility of translating languages by machine

T h e r e a d e r of t h e t r a n s l a t i o n p lays a v i t a l ro le . I n a sense it is h e w h o m a k e s t h e t r ans l a t i on , b e c a u s e h e a l o n e u n d e r s t a n d s t he m e a n i n g . T h e m a c h i n e on ly r ecodes t h e foreign l a n g u a g e , so t h a t i t c a n b e m o r e easi ly t r a n s l a t e d b y t h e r e a d e r . E x p e r i m e n t s h a v e s h o w n t h a t word- for -w o r d t r ans la t ions c a n b e d e c i p h e r e d m o r e easily if t h e r e a d e r is fami l ia r w i t h t h e t op i c t h a t is b e i n g discussed. If h e knows w h a t t h e a u t h o r is t r y i n g t o say, h e c a n r e c r e a t e t he w o r k w i t h ve ry few clues . T h u s , w o r d - f o r - w o r d t r a n s ­la t ion cou ld be useful for p e o p l e w h o w a n t to k e e p a b r e a s t of la rge a m o u n t s of foreign t e c h n i c a l l i t e r a t u r e in the i r field, b u t i t w o u l d p r o b a b l y n o t b e a d e q u a t e for a careful r e a d i n g of a n y o n e a r t ic le .

I t has b e e n sugges ted t h a t r o u g h w o r d - f o r - w o r d t r ans l a ­t ions shou ld b e revised a n d e d i t e d b y a n e x p e r t . F o r cer­t a in pu rposes revis ion m a y b e adv i sab le , b u t p e r h a p s t h e best course w o u l d b e to let t h e final r e a d e r m a k e his o w n revision, s ince h e is a n e x p e r t in t h e field a n d c a n use his b a c k g r o u n d k n o w l e d g e in d e c i d i n g w h a t t h e a u t h o r p r o b ­ab ly m e a n t .

I t is v e r y i m p o r t a n t t h a t t h e final r e a d e r b e cons ide red a l ink in t h e t r ans l a t i on c h a i n . A n y i m p r o v e m e n t in t h e t r ans la t ion w h i c h speeds u p t h e r e a d e r ' s c o m p r e h e n s i o n is w o r t h whi le , s ince t h e r e a d e r ' s t i m e is expens ive .

T h u s , word - fo r -word t r a n s l a t i o n c a n b e i m p l e m e n t e d n o w b y us ing p r e s e n t - d a y g e n e r a l - p u r p o s e d ig i t a l c o m p u t e r s or , m o r e economica l ly , b y u s i n g spec i a l -pu rpose m a c h i n e s . T h e o u t p u t m i g h t b e useful, b u t w o u l d def ini te ly b e of p o o r qua l i ty , p r o b a b l y so p o o r t h a t t h e r e a d e r w o u l d h a v e to b e h igh ly m o t i v a t e d before h e w o u l d w a n t t o use it .

SENTENGE-FOR-SENTENGE TRANSLATION

A T T E N T I O N n o w wil l b e g iven to t h e q u e s t i o n : H o w c a n a m a c h i n e p r o d u c e s o m e t h i n g b e t t e r t h a n a word - fo r -word t rans la t ion? T h e deficiencies of word - fo r -word t r ans l a t i ons will b e e x a m i n e d , n o t f rom t h e p o i n t of v i ew of h o w a r e a d e r cou ld cope w i t h t h e m , b u t r a t h e r of h o w a m a c h i n e c o u l d cope w i t h t h e m . A d iagnos is of t h e difficulties will b e a t t e m p t e d , a n d a r e m e d y wil l b e p r o p o s e d . T h i s wil l necessi ta te some e x a m i n a t i o n of t h e w a y in w h i c h w o r d s a r e c o m b i n e d to m a k e sen tences in a l a n g u a g e .

ANALYSIS OF DIFFICULTIES

T H E MAJOR SHORTCOMINGS of w o r d - f o r - w o r d t r ans l a t ions

h a v e b e e n g r o u p e d u n d e r m u l t i p l e - m e a n i n g difficulties a n d w o r d - o r d e r difficulties. I t is n o w a d v a n t a g e o u s to r e ­classify t h e m as lexical difficulties a n d s t r u c t u r a l difficulties.

Lex ica l difficulties i n c l u d e t h e m u l t i p l e - m e a n i n g s of words l ike power , force, i n t e g r a t e , po t en t i a l , a n d so on . I t ha s b e e n suggested t h a t t h e use of a field glossary w o u l d res t r ic t these w o r d s to o n e m e a n i n g , t h e t e c h n i c a l o n e . A s ta r t l ing excep t ion to t h e obse rva t i on t h a t m o s t n o u n s , verbs , a n d adject ives h a v e few m e a n i n g s is t h e w o r d run, w h i c h , a c c o r d i n g to W e b s t e r ' s Co l l eg ia t e D i c t i o n a r y , ha s 5 4 . ( T h e a v e r a g e n u m b e r of m e a n i n g s p e r w o r d in^ W e b s t e r ' s Col leg ia te D i c t i o n a r y is a b o u t 2 . 5 . )

S t r u c t u r a l difficulties i n c l u d e w o r d o rde r , inf lect ional end ings , m u l t i p l e m e a n i n g s of m o s t ar t ic les , p repos i t ions , conjunc t ions , a n d adve rbs . M o s t of t h e m u l t i p l e - m e a n i n g

difficulties in a t ex t a r e s t r u c t u r a l b e c a u s e a m a j o r i t y of t h e r u n n i n g w o r d s in a t ex t a r e s t r u c t u r a l w o r d s , w i t h m o r e t h a n t h e a v e r a g e n u m b e r of m e a n i n g s . T h e 54 m e a n i n g s of t h e w o r d run a r e d i v i d e d i n t o four lists t h a t a r e s t ruc tu ra l l y d i f fe ren t : t r ans i t i ve v e r b , i n t r ans i t i ve v e r b , n o u n , a n d adjec t ive . I t is c l ea r t h a t a m a c h i n e m e t h o d of h a n d l i n g s t r u c t u r a l p r o b l e m s is h i g h l y des i r ab le .

D iagnos i s of t h e difficulties is easy—af te r all , a w o r d is usua l ly n o t a m b i g u o u s w h e n its c o n t e x t is cons ide red . I n w o r d - f o r - w o r d t r ans l a t i on , n o a t t e m p t is m a d e t o t r a n s l a t e a w o r d in its con tex t , b u t a m e a n i n g or m e a n i n g s a r e ass igned t o i t o n t h e basis of its fo rm a l o n e . F r o m this p o i n t of v iew, a word - fo r -word t r a n s l a t i o n m i g h t b e cal led a l i te ra l t r an s l a t i on , w h e r e a s a t r a n s l a t i o n w h i c h takes i n to a c c o u n t t h e c o n t e x t m i g h t b e ca l led a free t r ans l a t ion . W h e t h e r a t r a n s l a t i o n is l i te ra l or free rea l ly d e p e n d s u p o n t h e p o i n t of v iew. A word - fo r -word t r a n s l a t i o n t rans la tes w o r d s l i tera l ly , b u t t r ans l a t e s le t ters a n d syllables freely. W o r d s m a y b e t r a n s l a t e d freely if l a r g e r b locks a r e t r a n s l a t e d l i tera l ly . I n gene ra l , a l i te ra l t r a n s l a t i o n will b e be t t e r if i t is b a s e d o n l a r g e r b locks of tex t .

W h a t is n e e d e d is a w a y to use t h e c o n t e x t in t r a n s l a t i n g t h e w o r d s . I nves t i ga t i on h a s s h o w n t h a t u s ing t h e con t ex t of a c o m p l e t e s en t ence is a g r e a t d e a l b e t t e r t h a n us ing sma l l e r p a r t s of a sen tence , e.g., a p h r a s e , b u t is n o t signi­ficantly worse t h a n us ing t h e c o n t e x t of t w o or m o r e sen tences . A s e n t e n c e seems to b e a n a t u r a l l a n g u a g e u n i t . T o see h o w t h e i n f o r m a t i o n in a s e n t e n c e c o n t e x t c a n b e used for t r ans l a t ion , m o r e k n o w l e d g e of l a n g u a g e s t r u c t u r e is n e e d e d .

S T R U C T U R E OF LANGUAGE

T H E MOST SIGNIFICANT THING t h a t c a n b e said a b o u t a

sen t ence is t h a t i t h a s a s t r u c t u r e . S o m e h i n t of this is g iven in t h e g r a m m a r t a u g h t in g r a m m a r school , w h e r e w o r d s a n d g r o u p s of w o r d s a r e classified as n o u n s , verbs , subjects , objects , a n d so on , a n d s imple sen tences a r e p a r s e d in o r d e r to i l l u s t r a t e t h e i n t e r r e l a t i o n of the i r v a r i o u s p a r t s .

I t is r a t h e r i l l u m i n a t i n g to cons ide r s o m e of t h e q u a n t i t a ­t ive aspec ts of s e n t e n c e s t r u c t u r e . O f t h e 1 0 1 0 0 a r b i t r a r y sequences of 2 0 w o r d s , p e r h a p s on ly 1 0 3 0 to 1 0 Δ 0 a r e a c t u a l l y Eng l i sh sen tences . Al l of t h e rest , o r p r a c t i c a l l y all of t h e m , a r e m e r e i n c o h e r e n t j u m b l e s of w o r d s . S e n t e n c e s t r u c t u r e t h u s p laces v e r y s t r o n g cons t r a in t s o n t h e sequences of w o r d s . A n o t h e r w a y of a p p r e c i a t i n g t h e d e g r e e of c o n s t r a i n t i m p o s e d b y s en t ence s t r u c t u r e is to t a k e a n a c t u a l 2 0 - w o r d s e n t e n c e f rom a n e w s p a p e r a n d t r y to r e a r r a n g e t h e w o r d s . O f t h e 2 0 ! d i f ferent a r r a n g e m e n t s , on ly a few a r e Eng l i sh sen tences . T h a t t h e r e m u s t b e some s t r u c t u r e or sys tem u n d e r l y i n g these c o n s t r a i n t s is a p p a r e n t . A m a n in his l i fet ime w o u l d n o t b e phys ica l ly a b l e to h e a r or speak m o r e t h a n 1 0 9 sen tences , y e t h e c a n in s t an t ly r ecogn ize w h e t h e r a n y g iven s e q u e n c e of 2 0 w o r d s is i n c l u d e d a m o n g t h e 1 0 3 0 to 1 0 5 0 E n g l i s h sen tences o r is left i n t h e r e m a i n i n g 1 0 1 0 0 u n g r a m m a t i c a l j u m b l e s of w o r d s . H e c a n d o this b y r e c o g n i z i n g t h a t t h e s e q u e n c e conforms , or does n o t conform to o n e of t h e a l l o w a b l e s en t ence p a t t e r n s . R e c o g n i t i o n of s en t ence p a t t e r n is b a s e d p r i m a r i l y o n fo rmal features of t he

N O V E M B E R 1 9 5 6 Yngve—Translating by Machine 9 9 7

Page 5: The technical feasibility of translating languages by machine

s e q u e n c e a n d n o t o n t h e m e a n i n g ; for t h e mean ing les s l ines b y Lewis C a r r o l l

' T w a s bri l l ig , a n d t h e s l i thy toves D i d gyre a n d g i m b l e in t h e w a b e :

All m i m s y w e r e t h e borogoves , A n d t h e m o m e r a t h s o u t g r a b e .

a r e ins t an t ly r ecogn ized as c o n f o r m i n g to Eng l i sh s en t ence p a t t e r n s .

Cons t r a in t s powerfu l e n o u g h to res t r ic t t h e sequences of w o r d s to on ly those t h a t confo rm to t h e sen tence s t ruc tu re s of a l a n g u a g e c a n b e r ep re sen t ed b y t h e fol lowing s c h e m e . A set of s t r u c t u r a l choices is e n c o d e d in to a s e q u e n c e of w o r d s . T h e first s tep is to choose o n e of severa l possible sen tence types . F o r Eng l i sh t h e y m i g h t be

sen tence = " I " + p r e d i c a t e a, or sen tence = " y o u " + p r e d i c a t e b, o r sen tence = subject c + p r e d i c a t e c, or sen tence = " i t " + p r e d i c a t e / ? + " t h a t " c l a u s e . . .

I n t h e foregoing, " s e n t e n c e " s t ands for all sen tences in E n g l i s h ; " p r e d i c a t e a" s t a n d s for all p r e d i c a t e s t h a t c a n follow t h e f irs t-person s ingu la r subjec t " I " ; a n d so on . T a k e t h e e x a m p l e

sen tence = " i t " + p r e d i c a t e / ? + " t h a t " c lause

T h i s m e a n s t h a t t h e sen tence will s t a r t w i t h t h e w o r d " i t , " fol lowed b y a p a r t i c u l a r t y p e of p r e d i c a t e , p r e d i c a t e p, a n d followed in t u r n b y a c lause i n t r o d u c e d b y t h e w o r d " t h a t . " T h e p a r t i c u l a r p r e d i c a t e of t y p e p a n d t h e p a r t i c u l a r " t h a t " c lause will e m e r g e as a resul t of fu r the r choices . F o r t h e p r e d i c a t e , t h e choices m i g h t b e

p r e d i c a t e / ? = v e r b " t o b e " + adjec t ive b, or p r e d i c a t e p = v e r b " t o b e " + p repos i t i ona l p h r a s e , or p red ica te / ? = v e r b " t o b e " + pas t pa r t i c i p l e c

T h e first of these m i g h t b e chosen . F u r t h e r choices w o u l d d e c i d e t h e tense of t h e v e r b — p r e s e n t tense, for e x a m p l e . F o r t h e adjec t ive , t h e choices m i g h t b e

adject ive b = " c l e a r , " or adject ive b = " t r u e , " o r adject ive 6 = " l i k e l y " . . .

If " c l e a r " is chosen , t h e s en t ence so far i s : " I t is c l ea r + c t h a t ' c l a u s e . " N o t e t h a t ad jec t ive b w o u l d n o t i n c l u d e adject ives like ho t , r ed , smal l , a n d so on .

A n e n c o d i n g s c h e m e of this sort c a n a l low for c e r t a i n types of recurs iveness in sen tence s t r u c t u r e . F o r e x a m p l e

I t is c lear t h a t it was likely t h a t it w a s t r u e t h a t . . . T h i s t y p e of recurs iveness is i n d i c a t e d b y

" t h a t " c lause = " t h a t " + sen tence

w h i c h leads b a c k a g a i n to t h e b e g i n n i n g of t h e set of choices .

A n e n c o d i n g s c h e m e of this sort , s t a r t i ng as it does w i t h a set of choices o r d i r ec t ions for p r o d u c i n g a c e r t a i n sen tence a n d e n d i n g w i t h t h e w o r d s of t h a t sen tence , c a n b e t h o u g h t of as a g r a m m a r of a l a n g u a g e f rom t h e p o i n t of v iew of t h e speake r r a t h e r t h a n of t h e l is tener . S c h e m e s f rom t h e

speaker ' s p o i n t of v iew h a v e t h e a d v a n t a g e t h a t e a c h set of choices leads u n a m b i g u o u s l y to a u n i q u e s equence of words . O n t h e o t h e r h a n d , a g iven s equence of w o r d s m a y be a m b i g u o u s l y r e p r e s e n t e d b y t w o o r m o r e different sets of choices . As a n e x a m p l e of this , cons ider t h a t

p r e d i c a t e a = " a m h e r e n o w , " or p r e d i c a t e a = " see it n o w " . . . a n d p r e d i c a t e b = " a r e h e r e n o w , " or p r e d i c a t e d = " s e e it n o w " . . .

T h e n " see it n o w " c a n b e p r o d u c e d a m b i g u o u s l y as a first-person s ingu la r or a s econd-pe r son p r e d i c a t e . A g r a m m a r from t h e l i s tener ' s p o i n t of v iew cou ld resolve this a m b i g u i t y on ly b y cons ide ra t i on of t h e c o n t e x t " I " or " y o u . " S im­ilar ly, w o r d s c a n h a v e m u l t i p l e m e a n i n g s from a l i teral p o i n t of view, b u t b e u n a m b i g u o u s if the i r role in t h e sen tence s t r u c t u r e is k n o w n . As a n e x a m p l e , cons ider this Engl i sh sen tence

T h e (Der ) m a n showed t h e ( d e m ) b o y t h e (das) p i c tu r e .

a n d this p a r t of a G e r m a n sen tence

D e r ( the) M a n n , d e r (who) d e r (of the ) T a t fâhig ist, . . .

I n this w a y , cons ide ra t ions of sen tence s t ruc tu r e c a n resolve m a n y of t h e m u l t i p l e - m e a n i n g p r o b l e m s .

U s u a l l y e a c h possibi l i ty of choice of s t r u c t u r e in a l a n g u a g e h a s a m e a n i n g ass igned to it. A d i s t inc t ion has b e e n m a d e b e t w e e n t h e " s t r u c t u r a l " m e a n i n g a n d t h e " l e x i c a l " m e a n i n g . T h e t w o sentences " T h e c h a i r is on t h e r u g . " a n d " T h e r u g is o n t h e c h a i r . " h a v e different m e a n i n g s because t h e w o r d s a p p e a r a t different posi t ions in t h e s t r u c t u r e . O n t h e o t h e r h a n d , t h e difference in m e a n i n g b e t w e e n " T h e c h a i r is on t h e r u g . " a n d " T h e c h a i r is on t h e f loor ." is a lexical difference. Lexica l m e a n i n g tells w h a t sen tences ta lk a b o u t ; s t r uc tu r a l m e a n i n g tells w h a t t h e y say a b o u t it.

Jus t as different l a n g u a g e s a r e different in the i r lexicog­r a p h y , t h e y c a n also b e different in the i r s t r u c t u r e . L a n g u a g e s t h a t a r e h is tor ica l ly r e l a t ed a r e usua l ly s imi la r in s t r u c t u r e . E v e n if two l a n g u a g e s h a v e ce r t a in s t ruc tu res in c o m m o n , it m a y b e t h a t t h e y assign different m e a n i n g s to these s t ruc tu res . L a n g u a g e s m a y also differ in w h a t aspects of t he m e a n i n g a r e ca r r i ed b y s t r u c t u r e a n d w h a t aspects a r e ca r r i ed b y l ex i cog raphy . T h e d i s t inc t ion b e t w e e n sub­j ec t a n d objec t in a s en t ence is i n d i c a t e d s t ruc tu ra l l y by w o r d o r d e r in Engl i sh , b u t in G e r m a n it m a y b e i nd i ca t ed b y t h e lexical difference b e t w e e n t h e w o r d s " d e r " a n d " d e n . " S imi la r ly , t h e m e a n i n g t h a t is ass igned lexical ly to t h e w o r d "if" in Eng l i sh is some t imes i n d i c a t e d s t ruc tu ra l ly b y w o r d o r d e r in G e r m a n .

TRANSLATION OF STRUCTURE

I F TRANSLATION BY MACHINE is to b e d o n e on a sentence-

by- sen t ence bas i s—using sen t ence s t r u c t u r e to resolve t he w o r d o r d e r a n d t h e m u l t i p l e m e a n i n g s of t h e i nd iv idua l words , a n d co r rec t ly t r a n s l a t i n g t h e s t r u c t u r a l m e a n i n g s — g r a m m a r s a r e n e e d e d for e a c h l a n g u a g e . T h e s e g r a m m a r s shou ld consist of desc r ip t ions of e a c h l a n g u a g e , w h i c h a r e c o m p l e t e e n o u g h to m a k e t h e d is t inc t ions in m e a n i n g , t h a t a r e necessary for t r a n s l a t i o n — p r o v i d i n g a desc r ip t ion of this

9 9 8 Yngve—Translating by Machine E L E C T R I C A L ENGINEERING

Page 6: The technical feasibility of translating languages by machine

sort for even o n e l a n g u a g e is s o m e t h i n g t h a t h a s n o t ye t been d o n e , because t h e c o n c e p t s invo lved a r e j u s t n o w b e ­c o m i n g u n d e r s t o o d a n d b e c a u s e l a n g u a g e s a r e e x t r e m e l y complex in the i r s t ruc tu res . O n c e t w o of these descr ip t ions a r e ava i lab le , t h e y c a n b e set s ide b y side, a n d t h e equ iv ­alences s ta ted . I t m u s t b e d e c i d e d w h i c h sen tences in l a n g u a g e A a n d l a n g u a g e Β a r e e q u i v a l e n t . T h e u n d e r l y i n g a s s u m p t i o n is t h a t it is possible to find s u c h c o r r e s p o n d e n c e s . Of course , th is a s s u m p t i o n is n o t s t r ic t ly t r u e , for n o two l anguages a r e c o m p l e t e l y t r a n s l a t a b l e . T h e a s s u m p t i o n is p r o b a b l y ve ry n e a r l y va l id for r e l a t e d l a n g u a g e s such as G e r m a n , Engl i sh , F r e n c h , R u s s i a n , a n d for scientific dis­course , or o t h e r types of careful expos i t ion .

If a desc r ip t ion of t h e s t r u c t u r e of two l a n g u a g e s a n d a s t a t e m e n t of t h e equ iva l ences in m e a n i n g b e t w e e n t h e m a r e given, on ly o n e m o r e t h i n g is n e e d e d for a m e c h a n i z e d t r ans la t ion p r o c e d u r e : a m e c h a n i c a l m e t h o d of r ecogn iz ing the s t ruc tu r e of a s en t ence f rom t h e s e q u e n c e of w o r d s compos ing t h e sen tence . T h e precise fo rm of this r ecogn i ­t ion r o u t i n e m u s t a w a i t t h e desc r ip t ion of t h e s t ruc tu re s to be recognized . H o w e v e r , t h e process m u s t o p e r a t e f rom t h e p o i n t of v iew of t h e p e r s o n w h o r e a d s or h e a r s a s en tence . I t m u s t t en ta t ive ly r ecogn ize smal l g r o u p s of w o r d s , p l a c e t h e m in a t e n t a t i v e s t r u c t u r e , a n d so p r o c e e d , e a c h t i m e e l im ina t i ng c e r t a i n possibil i t ies. O n c e t h e s t r u c t u r e of t h e sen tence to b e t r a n s l a t e d h a s b e e n r ecogn ized , it c a n b e t r ans l a t ed in to a n e q u i v a l e n t s t r u c t u r e in t h e o t h e r l a n g u a g e , a n d the co r rec t m e a n i n g s c a n b e chosen f rom t h e m u l t i p l e m e a n i n g s of t he w o r d s .

RESULTS T O BE EXPECTED

T H E TRANSLATIONS p r o d u c e d b y a r o u t i n e t h a t t r ans l a t e s on a sen tence- for -sen tence basis will b e vas t ly b e t t e r t h a n t h e o u t p u t of a w o r d - f o r - w o r d t r a n s l a t i n g m a c h i n e . M o s t o u t p u t sen tences wil l b e g r a m m a t i c a l l y co r rec t . T h e r e will b e ve ry few w o r d - o r d e r p r o b l e m s r e m a i n i n g . M a n y of t h e m u l t i p l e - m e a n i n g difficulties will b e solved, pa r t i cu l a r ly those invo lv ing t h e m o s t f r equen t w o r d s of t he l a n g u a g e a n d t h e ve ry f r equen t inf lect ional e n d i n g s .

T h e r e will be a p a r t i a l so lu t ion to m u l t i p l e - m e a n i n g p r o b ­lems such as t h e di f ferent m e a n i n g s of t h e w o r d " c a n , " w h i c h a r e d i s t ingu i shed on t h e basis t h a t o n e m e a n i n g is c o n n e c t e d w i t h a v e r b , o n e w i t h a n aux i l i a ry v e r b , a n d one w i t h a n o u n . T h e r e will be s o m e m u l t i p l e - m e a n i n g p r o b ­lems r e m a i n i n g . O f these , s o m e c a n b e resolved b y j u d i c i o u s use of a field glossary or b y r e l a t ed t echn iques . T h e rest m u s t be resolved b y t h e good sense a n d in te l l igence of t h e r e a d e r of t h e t r ans l a t i on .

CONCLUSION

T h e s ta te of t h e a r t of m e c h a n i c a l t r ans l a t i on c a n be s u m m e d u p as fol lows:

1. W o r d - f o r - w o r d t r ans l a t ions c a n b e m a d e n o w on h igh­speed g e n e r a l - p u r p o s e d ig i ta l c o m p u t e r s .

2. W o r d - f o r - w o r d t r ans l a t ions cou ld b e m a d e m o r e e c o n o m i c a l l y b y m e a n s of spec i a l -pu rpose m a c h i n e s bu i l t w i t h exis t ing t echno logy .

3 . W o r d - f o r - w o r d t r ans l a t i ons p r o m i s e to b e cons ider­a b l y c h e a p e r t h a n m a n - m a d e t r ans l a t ions .

4. W o r d - f o r - w o r d t r ans l a t i ons a r e ve ry c r u d e , b u t m a y b e useful w h e n m o r e a c c u r a t e t r ans l a t ions a r e n o t w o r t h t h e a d d i t i o n a l cost.

5. If s o m e t h i n g b e t t e r t h a n a word - fo r -word t r ans l a t ion is des i red , o u r best h o p e is to t a k e i n to cons ide ra t i on t h e sen tence s t r u c t u r e in des ign ing a t r ans l a t i on r o u t i n e .

6. P r o v i d i n g t r a n s l a t i o n rou t i ne s on a sentence-for-sen tence o r s t r u c t u r a l basis r equ i r e s a cons ide rab l e a m o u n t of de t a i l ed l inguis t ic work .

REFERENCES 1. Machine Translation of Languages (book), edited by W. N. Locke, A. D. Booth. Technology Press of Massachusetts Institute of Technology, Cambridge, Mass., John Wiley and Sons, Inc., New York, Ν. Y.; Chapman and Hall, Ltd., London, England, 1955.

2. Mechanical Translation, Massachusetts Institute of Technology, Cambridge, Mass., vols. 1, 2, 1954, 1955.

3. Human Behavior and the Principle of Least Effort (book), G. K. Zipf. Addison-Wesley Press, Cambridge, Mass., 1949.

4. Simple Games of Strategy Occurring in Communication through Natural Lang­uages, Benoit Mandelbrot. Transactions, Institute of Radio Engineers, Professional Group on Information Theory, PGIT-3, March 1954, pp. 124-37.

Secondary Cooler for Supersonic Wind Tunnel T h e s e c o n d a r y cooler s h o w n a t r i g h t d u r i n g ins ta l l a t ion

a t t he Lewis F l igh t P r o p u l s i o n L a b o r a t o r y a t t h e C l e v e l a n d (Oh io ) A i r p o r t is a c o m p o n e n t p a r t of a n e w $ 3 3 mi l l ion supersonic w i n d t u n n e l , des igned for r e sea r ch on a i r p l a n e engines of t h e fu tu re . T h e 10- X 10-foot t u n n e l will d u p l i ­ca te a i r speeds f rom t w o to t h r e e a n d o n e half t imes t h e speed of s o u n d a n d wil l b e a b l e to h a n d l e eng ines a n d e n g i n e c o m p o n e n t s u p to 5 feet in d i a m e t e r . I t s 2 5 0 , 0 0 0 - h p e lec t r i c -motor d r ive is t h e m o s t powerfu l of its k i n d .

T o m e e t exac t ing cond i t i ons for t h e s e c o n d a r y cooler , engineers a n d officials selected iC-Fin coils, m a n u f a c t u r e d b y Griscom-Russel l C o m p a n y , Mass i l lon , O h i o , a subs id i a ry of G e n e r a l Precis ion E q u i p m e n t .

T h e over-al l coil consists of 14 i t - F i n b u n d l e s , e a c h h a v i n g 6 3 6 c o p p e r tubes . T h e to t a l surface is 480 ,000 s q u a r e feet.

N O V E M B E R 1 9 5 6 Yngve—Translating by Machine 9 9 9