82
I IAD-TDI131UMARCH 591113 "FULCRUM TECHNIQUES TO LANGUABE ANALYSIS THOMPSON RAMO WOOLORIOGE INC. t o1 0. M U .. ,,, N4 q44 1c UL 0- C1) IB Ic D DD 'HAH/l']'EiJlbHbLA4 I (A) SIGNIFICANT (B) CONSIDERABLE (C) LARGE (D) GREAT IRP AT PDON FT ONO OUITR MM W4KAt ML MRuu u FWi S~MsuW if $TAM Aif MU FIRM. iaauff am .mWA U NEUKCW . 9018es aimm mi

I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I IAD-TDI131UMARCH 591113

"FULCRUM TECHNIQUES TO LANGUABE ANALYSISTHOMPSON RAMO WOOLORIOGE INC.t o1 0. M U .. ,,, N4

q44

1c

UL

0-

C1)IB

Ic

D DD

'HAH/l']'EiJlbHbLA4I (A) SIGNIFICANT (B) CONSIDERABLE (C) LARGE (D) GREAT

IRP AT PDON FT ONO OUITR MM W4KAt MLMRuu u FWi S~MsuW if $TAM Aif MU FIRM. iaauff am .mWA U NEUKCW . 9018es aimm mi

Page 2: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

PATENT NOTICE: When Government drawings, specifications, or other data are used for any purpose other than In connection

with a definitely related Government procurement operation, the United States Government thereby incurs no responsibility nor

any obligation whatsoever and the fact that the Government may have formulated, fumished, or in any way supplied the said

drawings, specifications or other data is not to be regarded by implication or otherwise as in any manner licensing the holder

or any other person or corporation, or conveying any rights or permission to manufacture, use, or sell any patented Invention that

may In any way be related thereto.

Qualified requestors may obtain copies of this report from the ASTIA Doctument Service Center, Dayton 2, Ohio. ASTIA Servicesfor the Department of Defense contractors are available through the "Field of Interest Register" on a "need-to-know" certified

by the cognizant miitary agency of their project or contract.

Page 3: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

!

RADC-TDR-63- 168

5 March 1963

IFINAL REPORT

FULCRUM TECHNIQUES TO SEMANTIC ANALYSIS

IIi

THOMPSON RAMO WOOLDRIDGE INC.TRW COMPUTER DIVISION

C169-3U5

AF30(602) -2643Project No. 4599

Task No. 459901

I[[r[

IIPrepared for the Information Processing Laboratory,Rome Air Development Center, Research and Technology Division

r Air Force Systems Command, United States Air ForceGriffiss Air Force Base, New York

[

Page 4: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

FORE WORD

The Machine Translation group at Thompson Ramo Wooldridge

Inc. has been working under the sponsorship of the Intelligence Labo-

ratory of the Rome Axr Development Center, Griffiss Air Force Base,

since 1959. This research continues work done under previous con-

I tracts with RADC.

During the course of the present research a concurrent contract

has been in effect with the National Science Foundation. Some of the

tools used in the present research were created under NSF support.

In general, studies in the area of Semantics have been done under

contract with RADC, while work to improve the techniques for re-

search in MT has been done under contract to NSF.

The support of the Intelligence Laboratory and its members

at the Rome Air Development Center is hereby gratefully acknowledged.

III

[

[Ii

Page 5: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I

I This report represents the work of

I Loretta Ertel

Herbert L. Holley

Paul L. Garvin

I Jules Mersel (Project Manager)

Christine A. Montgomery

I George Onischenko

Gerhard Reitz

Steven B. Smith (Associate Project Manager)

I!!I

II

[[

Page 6: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

!I[ ABSTRACT

A new semantic research technique has been developed andemployed under this contract. This technique makes possible the

construction of semantic classes of words which share the character-

istic of specifying a particular translation for a given polysemantic

Russian content word.

Improvements have been made in the RW program for the

areas of subject recognition and clause boundary determination.

Conclusions were reached about an improved English

synthesis. The latter involves reformulating Russian structures

I into appropriate but non-corresponding English structures.

The major hardware implication of the research suggests

that associative memories may provide a simplification in the area

of machine translation.

Flow charts and a sample translation are included.

II

[

L iv

Page 7: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

IIiIIi PUBLICATION REVIEW

This report has been reviewed and is approved.

ApprovedFRANK J. INIChief, I ttion Processing laboratoryD rof tInti gecDirectora eof Intelligence & Electronic Warfare

Approved: INN, ol, USAF

Director of Intelligence & Electronic WarfareIrFOR THE COMMANDER:

IRVING GABELMANDirector of Advanced Studies

[![

[[I

Page 8: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

[II CONTENTS

I Forward ............................................... ii

List of Personnel ........................................ iii

I Abstract .... ............................................... iv

Publication Review Page .... ............................... v

I Introduction ............................................. . 1

Semantic Research ..... .................................... 2

Clause Boundary Determination ........................... 21

I Subject Recognition ...................................... .. 23

Transformation Programs for Word OrderArrangement in the English Translation .................. 24

Synthesis .... .............................................. 26

Machine Translation of Stylistic Differences .................. 27

Semantic Research Using the Ushakov Dictionary ........... 30

Test Sentences for Syntax Flow Charts ....................... 31

I Hardware Implications of the Research .................... 45

Dictionary Updating ........................................ 52

Sample Translation .... ..................................... 53

Flow Charts of Determiner-Determinee Method ............. 64

I Conclusions and Recommendations ......................... 73

!I

| vi

Page 9: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I

INTRODUCTION

The first section of the report describes the semantic research

technique which was developed under this contract. This technique

is dependent upon our capability for automatic sentence parsing by

[ the Fulcrum approach.

There were two goals in this research; first to define rules

for a particular type of multiple meaning problem that had hereto-

fore been either ignored or considered impossible to solve; second,

to determine what light such rules might shed on the problem of

semantics in general.

Both goals were achieved; the second in particular displayed

certain consistencies in the Russian semantic system which are

remarkable.

I In our research, semantics means the ability to choose the

correct translation from among those provided in the dictionary.I Our studies have shown that the other text words which determine

this selection often fall into groups which themselves seem con-

ceptually related. There has always been a hope among researchers

that such semantic classes could be defined. We feel that our tech-

nique allows us to uncover the identity of these classes and, even

possibly, may provide a metric for the semantic distance function.

III[FF

[I

Page 10: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

SEMANTIC RESEARCH

The first step in our research was to select Russian text and

acquire corresponding English translations of that text. The Russian Itext was selected from the field of biology and was taken from the

Doklady Akademii Nauk. All the text was chosen from the same

field because we were interested in problems of multiple meaning

within a single field, rather than differentiations in translations

which were field-defined.

On the opposite page we show p. 564of Vol. 123 of the Doklady for biology.

Page 11: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Aeomaaauw Axanemumw ays CC.CIsIN& Tomnin .IA M

300JorHRC. A. MHJIEAKOUCKHA

J1YHHAq nEPHomj~qHoCTb HEPECTA Y JIHTOPAJ~bHbMXH DEPXHECYSJIHTOPAJ~bHbX BECnO35OH04HbhX

SEJWOO MOPS H APYrHX MOPER

(flpedcmaeiefo axade.wom H. H. f1J.a..wy~vso 8 ViIl 1958)

.Tlyrnasn flCpNoANqNocTb paarnmomermnt 1i aepecra, CBAofCrmernea AMNorkM311kM fiHTOpA~bfMX GCeCUOUOHO'NMIX ?portl4'IeCKIX N Oopea~AbHuX Mope ("6).I~A Oym 4 ayw opANOmN apvTuqecimnx mopefi Ao CMx flop s IInTepalype Ne o~me.

H&a~wNse JayNNOA UCpHOAH'INOCTN 8 J)a3MMO2KCNNN AIITOp~bHhIX GecnO.a3MHOq~beX Beaoro mopu Gsuio ycraimonAaeao maN nyTeM H3YhlIClNX ANNAMatunu%HCJWNNOCTH 9 NJUIIKTONC lix ne11M`111fCK3IX JII4NHOK; AOC1034'PHOC~h We PC-ayjrabToS (JOpa6OTKH KoJni'ecTueHhimx flpoG mopCKoWo xoJo- it meponlarncToHaAOKa3M~a CTalNCTH~eCKN (1-).

MasrepilaA cofeira~cii a BeJIHxofI CIJIme - flpwoAe me*Ay n-o. Knuboxi o. BSJINKiiA (Bejaoe msope, KI(SA8RISKUICKIM1 laJIns). C 26 VI no 14 IX 1957 r.Oeaa v3310 58 xoJviNecTSCNiIx iipo6 IIJIanxroiia. flpooh 6p&IINcb pasaemwoc roJpn8owive 16-8 m 8- 0 u cermo LeAuex 113 rasa M 43 c 3ambmaTetmM. lily.q2Aaac AlHH8MIMK qHcACIIHoCTNj SCX AfftniIlOk ApNNMX OCCI1OIDONOqNMX, S ?K*-IS (AAR cpaNcIINC111) PS15* nOCTONH~HIX IIJINK1CPO@ N HX 111111I1101. BCC npo6aa,4mucupcianuhee 4.1 4IopII8JINoN, OUm'iN ripoctNinami ?T8JshII. AJnn IIocTpO-vNN Ipa:N4K0S NcnoJob3OS8Icb wAeJ1n~it 11A0NocTN A&Noro sum a I us

scjioe 16-0 m, raK icAK qcNJ~eNiioclb 9cex 113y'iaemhx sHAos NmOmNEJI8Cb 5 Oo-oUX ropH30ouirX CHHXPOHNo. Kpome TorO o w) cex rpa*rnacX SMUHIPMH.CICHC ftpauouoepiwe PR~bi (npodim 6panislCb qacro. s NIOJe qepe3 AMHI - vAU, 8 airY-Cce - ceuiidpe 'sepel Asa-'leriPe MNR. H3 He coCCm peryjixpiio, rJaa3HmmI o~~Cpa3ota iU33a noroAbi) nepe~eme~ JInnehIol Narrepnomutieei a paiSNomep-Nue - TpexA~eB~lue. .9r, a TaKNCC DIxiNC licex npo6 a oAnlY N Ty *e $uayflpwJiN (Na1 IIOJIOR DoAe) N "a GA~toN H Tom me meec B. CIJImm nowAmoaea8 3111111tieJbNOR Clefleui cNNTh MJIONCT CAY'lqgAHOCrN. Taic ica MaXCUMbMbUar~y6NNI B. Coamwa Gico~o 25 m. a npeoGJnaltaior rJI)ON~m Ao 20 m, TO JI'MUMAGnHhIX 6erfloloNoHoWtx, ecTpeyssiumeen a cc njianKiTowe, ace IIpNmaxenuaANK JIIITOPa~bNUM N 3epxuecyGJIHrOpa~lb~aaM 334MM.

Kaic nOKalaJI aMSaJINS, ANNaMHKe tIiCJocniIIocitN 6011buiel scrTH J1N'IIIOK AON-

NhIX 651Cf03S0NH'I~hX M3 [naNstious B. CaJImwI npNcyul eARNNIAf 31aconomepniiApirTm.orpawsiaxio.i~lJyNmIo nepnoAm'INocTb NCeCCTI EANbiwx SNAGS (pmc. I a -- r).fiyurnia nepwoAN'sNocri Napecra cooi)craeu~s. no NIIMiuM AINmmm, cae~yn-utum daoeC~otiotiNmm B. Cumin: 6pioxo~swum Lacuna divarlcata (0. Fabr.),Littorina littorca L.; Diaphana minuta Brown, Phifine aperta L., Lniapontlacapitals MOII., Eubrandius exlguus (A. a. H.). Tergipes despectus Johnston,a mine, aepoxmo, ee pxxy muaos us orpsAs~ reoJwMadopHux (J); ona npncy-

,,a pAy sycreop'larIx - MytIlus edullIs L., Macosna baltica L., Mysart-naria L.; seoKTopum ycouarsrn - Balanus balanus L., Verrca strOmla 0. F.MGIIer; PRAY Munaus N uMosoxiax Ophiopholis aculeata L., Ophiura robuataAyers, Aiterias rubena L. (ca.. pitC. 1), a Taicxce, oquu~ao, cyAn no cyamapuad

A)NaMNICS m'INCJirNOCu, N pxAy msNS Polydiaeta.

I 3

Page 12: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

ii

The next problem was that of deciding which words to deal

with. In the past, a good deal of semantic research has been

devoted to problems of multiple meaning in connection with function

words (such as prepositions, particles, conjunctions), while-

except for a few instances of idioms-the problem of the multiple

meaning of content words (i. e., nouns, verbs, etc. ) was ignored.

We therefore determined to fix our primary attention on content

words.

On the opposite page, we give asample of the content words investigated -

along with their multiple translations.

A

41

Page 13: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

A. in the process of I. relating to

B. connection(s) J. owing to

C. on account of K. due to

D. in view of L. 0E. relation M. relationship

F. in response to N. related

G. accordingly 0. thus

H. because of

I AaaJbHOeAE@

A. then F. ultimately

B. now G. further

C. subsequently H. later

D. subsequent 1. henceforth

E. prolonged

IaKWI pasmxi

A. various D. differing

I B. different E. varying

C. variation in the F. differences in the1I A. for C. over the course

B. course D. 0

ICTopo~a

A. side E. on the other hand

B. toward F. on one hand

C. direction G. laterally

ID. past

I c~yua~x

A. case C. events

B. occasions D. incidents

1 5

Page 14: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Next it was necessary to find out which content words had

multiple meaning problems within a single field, and occurred Hwith sufficient frequency to allow us to make some generalizations

about their behavior. For this purpose a program was written to

give us a frequency count of all words occurring within a large

body of text, and print out the individual Russian forms in the

order of their frequency of occurrence. We thereupon decided

to investigate all those content words with multiple equivalents

that had one form which occurred 20 or more times, as well as

certain other words of particular interest to us. jj

The following is a sample of the outputof the word frequency program. The wordsare listed in order of frequency and the num-ber to the far left indicates the position the I'word occupies in the list. The first groupof numbers to the right of a word indicatesthe number of times that word occurred.The next group of numbers indicates what IIpercentage of the total number of occurrencesof all words is represented by this word andall words prior to it in the list. For example, IIthis provides us with the information that the77 most frequent words (out of a total of37859 running words) represent one third ofthe occurrences in this text. The next groupindicates how much of the total text is repre-sented by occurrences of this word. The nextto last group is the position of the word in the||list multiplied by its frequency of occurrence.(This number, according to Zipf'a law, shouldapproximate 0. 1. Our "Zipf" number, however, Hdoes not take into account that many numbers,in reality, occupy the same position in thelist.) The last group of numbers (here zeros)has as yet no significance.

6

Page 15: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

IC

1.00 09 0 1Il w -i.P - oa . r ýc no oq,- noc

4. U00U 0 ) C) C U00U C C ' U0om MC) M MM 0n MM) NN 00

4 C!

'-P-P-O.O'.0 0*** aOlv O'D 1ýWINfi -0 -cý Q O - 0 - in- %A - - N - n VI &P M Vi V% -, n in -% U % - &

000 A*C OU0 0 0 0 000 0 0 0 0 0 0 0I ~ U00JCCU00)0)000C)CO)u)0 C0

Ic I.-j0-W elI >- 0 VI ill I0 4CA - l .0 - n.oJONW 4.

>zz z.' wu NNNNNW IIvi W4i I.-v a**iimnene nel11W i .11.1W f131. 1 7v WI lv i i -v .11 )h .1 1 Wi lv lv) lv .W v lililiWIliW

IJ x 'A- I-r

Q. Q U C UJ %A 0 %A2.4 P II -e u~cA~ . AjL

VIC V" 4 f~1-p -- PF-S0owommmomom OakP00009

Page 16: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

The next stop was to list the contexts in which these forms

occurred. In order to find themn in text, another program was [devised to print an alphabetical list of all the words that occurred

in text along with'their text locations.

The following is a sample of the outputof the program to provide a concordance.The first number to the right of a word in-dicates the number of times that wordoccurred. Following this for each occurrenceof the word five things are provided. The firstletter (B here in every case) indicates the Icorpus in which the word occurred, the secondletter ,the article,- and the third letter the pageof that article. The first number indicatesthe line on which the word occurred and thesecond the position on that line (first word,second word, etc.). For example, AKTEoccurred once in corpus B, article R, page 1A, line 12, word 2.

80

Page 17: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

ne

lo

4C

U.0 0

doCDim4 , do

go ~ itli, 0 11-- 40 NinF 0%A i OD111`1114

a- %in co' a i

10.41 &L 00 U I1jn ! z4 o

N r.. N~ ~ ~ 44 M0C1 0 4s

0. I0I ,

31 A-* .. 111 30- if- S o

0~~~~ 0 7( 0z d

'3 )j==I 1 IVA,

30 Z I#A'~ 3' 2t .2 *3

gi 3. Si 1 CI 1t3

Page 18: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

For each word we thereupon listed twenty or more occurrences

in context, along with the professional English translation of the

word and its context.

As an example of our procedure we wil use the Russian Fword OTHOUoKSE, which translates into English as relation, regard,

etc. 0

The following page shows a work sheetof one of our researchers, listing text loca-tions, Russian contexts and English transla-tions. The translations for the word underconsideration are underlined.

H

[IN

Page 19: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

L4WCUr, .

~~c~-4a ~ J~'Ch ~ 'Fa rE Thtk

IT po4V. o~7/c rb )e, A kv e

e , ce-, 74if- /-oc;/, .-L, ~ Cf. -A ýnI £ ~ 'V

0- e. g~k /, i / &i&' 0" s S

5',4f ~ ~ ~ ~ k SLIG-i"hcu~~/~

o. 14 oi LA~k Ol4(&-=I 4 eCIS C 744~.

It Ob -eO( F 71 eýilrT po L~ k 0-64, mi LTMOAA--AELA( 1c.k

1~. _0_ ___ _ _ _

M. -, PLAI&I-- l*A Ar0 IW9J o

(OI _-_. Om ,4 OO1 I 0TW 6iiA6,

Page 20: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Since translators use a proliferation of translations, many

of which overlap or are synonymous (see illustration on front 0

cover), it was necessary to determine the minimal number of

translations needed to render the word in question into good

English.

At the top of the following page wegive all the translations of OTHONE@HNused by the professional translators, andat the bottom the smaller number of trans-lations which we found necessary for theoccurrences under consideration. Instancesin which the professional translator recastthe entire sentence have not been includedin this table.

I1B

UNII

Page 21: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Translations Used by Professional Translators

Iratiorelationshipbehavior

i prationeffectsrelation

I respectproportionpoint of view

as regardsc onne ct ion

Srelisponse

(Necessary Tra ns lo fe i di

respectproportion

S~as regardsrelation0 (Illy" added to translation of preceding modifier)

ratioresponse

!!!

13II

Page 22: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

0Once the preceding two steps had been completed, these

contexts were "mapped" in such a way that any consistencies might

be quickly brought to the attention of the researcher.

On the opposite page we give thesyntactic "mapping" of the word OTYOUOHNO.The numbers down the left hand marginindicate the sentence in which the wordoccurred. The letters down the right hand [margin refer to the translation which theprofessional translator used in each case.The first column on the left gives theRussian words of which this word was anobject. The second column lists the prep-ositions of which this word was an object.The third column gives the modifiers whichpreceded the word in text. The fourthcolumn lists genitive nominals which followedthe word, and the next to last column showsthe prepositions which were governed by theword. The final column gives the objects ofthe prepositions which were governed byOJ KO3OHNi.1

0l0

II

14 N

Page 23: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

114

x 0

14 -o 0

0o 01)

co 04 0 64444~MM~ mM MMaaP eL

0.

0 4 0

ms 00 ) C .

I~~ 0I'U0 -- 'xM 0 3Q) 1-4

100

040

P.64 mm m m m IQm m m m 0 00 0

0)

m40 ii AF64U

.~~~1 of -k r (* * *

[0

IIs

Page 24: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Next it was necessary to discover which words in the environ- 0ment of the word under consideration could be considered deter-

miners for the various translations of that word.

A particular set of determiners i. ingenerally applicable within a particularsyntactic context. We list here the deter-miners which we found in text for OTYOleROOpalong with their translations and relevantsyntactic contexts. o

I1

HN

III

16I

Page 25: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I /I

(apparatus)

COCYAOS(vessels)I IOCoo~oO,, E~

(capacity)

1 (1) 3I OTHIO~e@HJ cCETOU as regards(synthesis)I

doraTCTna(wealth)

UPOKHKHIO56HKA(penetration) _j

in the proportion of

((1gram) )

"(magnitude) . (first)

I KOZMR'OCTa Kx pal ouy(quantity) (ration)

(2) OTHOXoemX- qMcKa I x Rxly ratio

(number) (number)

ZoARMeCTna x KO~uxeoCTByI (quantity) .•(quantity)

lnlII x (induction) relation•xrynex) r enp~ye"

(TOY3OK KyT6Mresponse(frogs)] I (temperature))respOnse

1 (3) no OTHOeiKIE X) with respect to

1 (4) B STOM OTHOUOK, ) in this respect

I (5) s Mop4oxaorxecxou OTKOE NN) morphologically

I

17

Page 26: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Based on the lists of determiners, we attempted to formulate

general rules for the correct translation of the words which we had

studied.

The group which has as its translation "ratio" displayed a

certain semantic homogeneity. The one-member group I r.

(I gram) suggested a group represented conceptually by "amount."

Other groups displayed no conceptual homogeneity and were sus-

pected as being open-ended or residue classes.

On the opposite page we show the Ginitial rules which were formulated for thetranslation of OYNOUOSSK on the basis ofour original corpus.

II

I

18 1

Page 27: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

(1) 5 0K O 3OSOi (genitive nominal block) e.g.

II A. When genitive nominal is specified amount (e. g., 1 re)translate as "in the proportion of"

I B. Otherwise, translate as "as regards"

Ii (2) oTXOe1E genitive nominal block x dative nominal block

A. When genitive nominal is a unit of measurement;"ratio"

B. Otherwise, either "relation" or "ratio"; except whenrule 3A applies

1 (3 ) OTHOeHO-- genitive nominal block

A. When genitive nominal is animate, "response"

I B. Otherwise, "relation"

I(4) UO OTHOMeHOW X "with respect to" (idiom)

(5) 5 STOM OYHOMeHKK "in this respect" (idiom)

[[[III[ 1

Page 28: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I

It was now necessary to apply the rules which we have

developed to fresh text and amend, augment or discard our previous

rules.

In the case of OTROONOaK for example, an examination of

additional text indicated that our rules were in the main correct.

More words were added to our list of determiners for the transla-

tion "ratio. " They were AJ[ZTOebHOCTK (length), 3M00T (height),

and AZNHO (also length). Their semantic relationship to the old

set turned out to be as predicted. The expression a 3TOU OTHOSOMM

(in this respect), which we had originally considered an idiom,

proved to be but one of a set, which in the context B (modifier)

oTH~sOueX (-Msx) determines the translation as "respect". The

determining modifiers found were ApyrzX (other) BOOX (all). 3Here again, we note a certain semantic homogeneity, and we could

extend this list on a speculative basis and, with the help of a 3dictionary, actually find new examples (e.g., uxorzx [ many] ).

The research in this area was a success. It has revealed

some of the most interesting facets of the Russian semantic system

to date. We feel that we have brought MT to the edge of even more

important discoveries in this area.

!II

II!

20 1

Page 29: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

CLAUSE BOUNDARY DETERMINATION!During the contract period a number of routines have been

I added to the basic syntax program to improve the quality of the

translation. The most significant of these is a syntax pass which

examines commas and conjunctions whose function is ambiguous

(that is, conjunctions which may connect words, phrases, or

clauses: for example, "and"), in order to determine whether they

function as clause separators. If they are, they are marked as

boundaries for subsequent syntactic and semantic searches. Two

types of clause boundaries are recognized: (1) those separating

independent clauses from each other (can be commas, or conjunc-

3 tions, or both), and (2) those which separate dependent clauses

from the clause on which they depend (commas only):

I Ex.: PoxyX.xeonpoTeuAW MoryT xxor•a RaXOARTbCA

TOJbXO B AeOTUax raydoxot 3o01, a 3OeMORH CpeAxeM It

KHapyZH00 30XV UOAIHOCTbM ammeHm MX. = ribonucleoproteins may

sometimes be found only in the cells of the deep zone, while elements

3 of the middle and outer zones are completely devoid of them..

I Ex.: Pjs6oytpceonpoTeHAhi, XOTOPHe OTCyTCTByDT OT

sixeMeHTO3 CpeAHOM x HapyxHol 3OH, MOryT HaxOAKTbCO B

I xc~eTiax ZrJydOKOtt 3OM. = ribonucleoproteins, which are not

present in elements of the middle and upper zones, may be found

in cells of the deep zone.

The reason for this distinction is that in the first case, syn-

tactic and semantic searches should be discontinued at the clause

boundary, whereas in the second, these searches should be inter-

I rupted at the left boundary of the dependent clause (which is not

searched for syntactic and semantic codes relating to the main

[

1[ 2

Page 30: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

clause), and should be resumed at the right boundary and continued

to the end of the sentence. This makes it possible to identify syn- Btactic and semantic elements of a single clause even when they are

separated by long stretches of dependent or parenthetical material.

The clause boundary algorithm first searches the sentence

for a comma or a conjunction. If the comma or conjunction is

between two predicates, the immediate environment is searched

to determine whether a definite decision can be made that the item

is a clause boundary. (One example of this would be a comma flpreceded by an unambiguous conjunction, such as RTC). If no syn-

tactic elements which definitely establish the item as a clause

boundary are present, the segment between the two predicates is

searched for additional commas or ambiguous conjunctions. If Pnone are present, the current item is marked as the clause boundary.

In cases where an additional potential clause boundary is present,

the environments of both items are searched to determine whether |

a definite decision can be made that one or both items are not clause

boundaries (for example, commas or ambiguous conjunctions con- Inecting two modifiers). If no definite decision is possible at this

point, a further search for an additional comma or conjunction is

made, and the same inspection of the environment is carried out

on the new item. If no decisions can yet be reached, a fourth

potential clause boundary is searched for and checked out. Where

no definite conclusions can be made at this point, attempts to re-

solve the boundary of that particular clause are abandoned and

processing of the next clause is begun.

After the initial search of the clause boundary algorithm, if [1the current comma or conjunction is not between two predicates

but between a predicate and the beginning of the sentence, the seg- -!ment between the comma or conjunction and the predicate is inspected

to determine whether it contains a relative pronoun. If a relative

pronoun is found, the current item is marked as the preceding clause

boundary and a search for the following clause boundary is then

initiated.

22

Page 31: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

l

SUBJECT RECOGNITION

As clause boundaries can now be defined accurately in all

but a few cases, it has been possible to make several improve-

ments in the quality of the translation by expanding the existing

subject recognition routine to handle objects and by including a

routine to identify a series of subjects. The subject-object recog-

nition routine operates within the limits of a particular clause as5 defined by the clause boundary pass, searching for both subject

and object in order of most frequent occurrence. The search

l for a subject is begun in the segment preceding the predicate and

if unsuccessful, continued in the segment extending from the pred-

l icate to the next clause boundary or end of sentence; in searching

for an object, the order of segments searched is reversed. When

a subject or object is encountered, it is marked as to syntactic

function and transferred to an extended nominal blocking pass which

sets the limits of the particular subject or object block.

If the predicate is plural, the subject search does not conclude

when one subject is found, but transfers control to a subroutine

I which searches for possible additional subjects. The immediate

context of the current subject is first investigated to determine

I whether commas or ambiguous conjunctions are present. If such

coordinators are found, a search for an additional subject is made;

if no commas or conjunctions are found, the search is abandoned.

In the case of a singular subject and a plural predicate a "trouble"

signal is stored if no additional subjects are encountered. When

additional subjects are found, they are marked as subjects and

processed by the subject blocking routine.

III

| 23

Page 32: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

TRANSFORMATION PROGRAMS FOR WORDORDER ARRANGEMENT IN THE

ENGLISH TRANSLATION

Since the functional load of word order in inflected languages

(such as Russian) is much lower than in non-inflected languages

(such as English), it is clear that an adequate machine translation

program must provide for the resolution of word order discrepan-

cies between the source and target languages. In our 1961 report

(Machine Translation Studies of Semantic Techniques AF 30(602)-

2036), we discussed in detail existing programs for three types of

word order rearrangement in the English translation. These three

types were described as subject-object rearrangement, rearrange-

ment of governing modifier packages, and rearrangement of auxil-

iaries and modals within predicate blocks. The first type of

rearrangement is the most complex, since it involves the identifi-

cation of more syntactic elements and reshuffling of large blocks

of text. Furthermore, in order to operate successfully within

multiclause sentences, it requires a clean definition of boundaries

between clauses. At the time of that report, large-scale rearrange-

ments could be effected only on single-clause sentences, as there

was no systematic way of identifying separate clauses. It was

possible to rearrange the English translation of:

3 cpeAe BINZOTCA ny'RoX sapKzexxux ROCTYIZ

(In the medium moves the beam of charged of actives particle&)

to read:

The beam of charged particles moves in the medium;

or to rearrange the English translation of:

nyRox 3apKieHHxM RaCT3M o0 acxiiA npoPeccopY

(The beam of charged particles explained the professor)

to read:

The professor explained the beam of charged particles.

24

Page 33: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

!

If an additional clause were added, the sentence would read:Inymoz 3apxzeHHhbx RaCTNI ObBCHKZ Upo0eccop, a CTyAOHT

He o~paqAz sHmmaHe

The beam of charged particles explained the professor, but

the student wasn't paying attention.

Then the first clause could not be handled by the word order trans-

formation program because there was no program for determiningwhether the ambiguous conjunction "a" connected words, phrases,

* or clauses.

Since the decisions of the clause boundary routine discussed

above are now one of the information inputs to the rearrangement

program, two separate searches for possible rearrangement require-

* ments are made on the sample sentence: one on the first clause

(the segment between the sentence beginning and the conjunction);

the next on the second clause (from the conjunction to the period).

If a requirement for rearrangement is revealed by search of the

first segment, the English translation of that segment is rearranged

as specified, independently of the second clause, which is not checked

for a rearrangement requirement until the rearrangement require-

I ment of the first clause has been satisfied. Thus, the resulting

English translation of the sample sentence is:

The professor explained the beam of charged particles,

but the student wasn't paying attention.

IIII

25I

Page 34: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

SYNTHESIS IiSeveral steps have been taken in this area to achieve some-

thing significantly beyond a modified word by word translation by

a concentration on the translation algorithm (and translating trans-

formation). With a very few exceptions in the past some Russian

constructions (i. e. idioms) have been translated into corresponding

English constructions. Because English and Russian display certain

syntactic similarities, this policy, though often awkward, is usually

comprehensible. In conjunction with the semantic research previ-

ously described, exploratory work was undertaken in this area. iiAs a result rules for the handling of Russian constructions contain-

ing moz0o and cze@yeT were devised and applied to new text.

The rules proved very successful and serve to render what were

previously tricky and awkward constructions into idiomatic English. 11We felt this was ample justification for programming these rules.

This activity is so recent that it is not reflected in the sample out-

put in the appendix of this report.

26

!i

Page 35: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

[

MACHINE TRANSLATION OF[ STYLISTIC DIFFERENCES

In connection with our preparations for the machine transla-

tion of fiction we have given considerable thought to the problem

of the automatic detection and correct rendition of stylistic differ-

[ ences. At the present stage of our research we are able to specify

two areas in which we shall ultimately be able to do this. One is

the stylistically correct use of vocabulary which is part of the over-

all problem of multiple meaning to which the research under the

present contract is addressive. This aspect of the stylistic problem

is not essentially different from other problems of multiple meaning

resolution. The second aspect of style is related to the stylistic

function of syntactic elements. We have given consideration to the

well-known fact that the position of subjects and objects with respect

to the predicate have a definite function in the Russian sentence, in

that there is a tendency for the initial position in the sentence,

whether it is filled by subject or object, to indicate old information,

while there is a tendency for the final position, whether it is filled

f by subject or object, to indicate new information. This stylistic

function of position within the sentence can be preserved in the

English translation and can contribute to producing machine transla-

tion which is not only intelligible but also stylistically more closely

comparable to the original. At this stage we are exploring two

IIpossibilities for the rendition of this stylistic value of position into

English.L One possibility is the use of passive translations for Russian

active sentences whenever in the Russian original the subject is in

I the sentence-final position, indicating that it constitutes new infor-

mation. In this case, the Russian subject would be rendered by the

English agent object which likewise would be in the final position,and hence equally indicative of new information. Our assumption

here would be that the passive English sentence is the stylistic

equivalent of the Russian active sentence with inverted order.

27I

Page 36: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

nii

As an example, let us take the following Russian sentence:"i•Kxok orpoMh• nyTb c oem paSBNTXM npoKAa xaEa CTpana

sa COPOX Asa roAa nocze Osepaesun IaaBTO MUNTSaCT0N X !

UOMOeNKOD H yCTaHonzemn Coi@TOZO• pa60o0-XpecTbRKOKOI

BDaCTH" " It stems from the lead article "Becnpumepmldt naypMI 11noAUH"r (Unparalleled Scientific Advance) in Pravda of 27 October

1959, dealing with the Soviet moon shot and the pictures taken of

the far side of the moon. In the light of our interpretation of style.

we may assume that the object of this sentence, "Wa 0 ft orpomxabd

flyTb... ." (What enormous path... ) is old information, in the

light of the preceding discussion of the great achievements reported

on, and as new information is presented the subject "MaMa ctpaWa"(our country) and particularly the final predicative complement"sa COpOK Asa ro~a..." (after forty-two years...). Clearly,

then, the most idiomatic translation, taking the stylistic function

of the sentence portions in the Russian original into account and

rendering them equivalently in the English would involve a change

from the active to the passive voice, and a retention of the order

in which the semantic components of the sentence appear in Russian.

Thus, instead of the rearrangement which our present program aims IIat, namely, "Our country in forty-two years after the overthrow of

the rule of the capitalists and landowners and the establishment of

the workers' and peasants' rule passed what enormous path in its

development, " we shall aim at the following translation which pre- Iserves the original order by changing the voice of the English verb

from active to passive: "What enormous path of development was

passed by our country in forty-two years after the overthrow of the -

rule of capitalists and landowners and the establishment of the

workers' and peasants' rule!"

The second possibility is to consider the function of the

English indefinite article or absence of the article in the plural as

an indication of new information. In this case we would give con-

sideration to the possibility of using the indefinite article or no

28

Page 37: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I1

article when the corresponding Russian sentence element occupies

the final position. An example would be the sentence "Kaz•o.33 3TMx AOCTYZOIIKe--OeCUp3MOepNJM HayqHM no~nzr! " taken

from the same document. If we translate the final element of this

sentence, the predicative complement "decnpmieputmN mayqxmml

aorzOHr,1 using the indefinite article in English, we shall come

closest to an idiomatic translation: 'Each of these achievements

is an unparalleled scientific advance."1

ii This research is as yet in the exploratory stage.

IiIiIIIII!II1 29

Page 38: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

SEMANTIC RESEARCH USINGTHE USHAKOV DICTIONARY

The utilization of examples of usage given in the Ushakov

Dictionary, which was proposed for the present research, has

been attempted. Unfortunately, in the tests which we conducted,

using about 100 entries, the majority of the examples given in the

Ushakov do not allow multiple meaning due to different subject

matter field. These differences are not amenable to the determiner-

determinee treatment which we are at present in a position to

implement.

I

ri

II30 I

Page 39: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

r TEST SENTENCES FOR SYNTAXFLOW CHARTS

In order to facilitate manual and machine checkouts of the

syntax routines, a large number of test sentences (189) were

created to serve as tools in this process.

The aim of these checkouts is different from that of the

processing of live text for testing purposes. The processing of

i live text tests whether or not our syntax rules cover all significant

conditions of Russian sentence structure. The purpose of our test

sentences, on the other hand, is to test, riot the adequacy of our

rules, but whether or not the program actually carries out these

rules as intended.

A test sentence was created to checkout each significant

branch of every flow chart. This means that for every question

asked on a flow chart, at least two sentences were created, one

corresponding to the yes answer, one to the no answer. We could

thus assume that whenever a failure occurred with a particular

test sentence, there was a great likelihood that this failure would

be due to the branch on the flow chart which this sentence was in-

tended to test.

Since the test sentences were designed for purposes of the

syntax, they were as much as possible deliberately restricted to

the syntactic conditions which they are intended to test. In particu-

lar we wanted to avoid complicating the tests by introducing more

than the minimum necessary number of dictionary entries. Con-

sequently, to display the syntactic difference to maximum advantage,

the same dictionary words were used over and over again in the test

sentences, bringing about a somewhat unnatural impression. Since

some of the test sentences were introduced to test flow chart exits

marked "notice of error", these particular sentences were designed

to be grammatically incorrect. An example is sentence No. 2 for

testing the relative pass.

I31

Page 40: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

'IIWe wish to illustrate the creation of the test sentences by

giving some examples of sentences pertaining to Homograph Resolu-

tion Pass HR 1. (Homographs of the type "YpyAo, coriaouo,"

called predicative-adverb-preposition homographs.) Test Sentence

No. I illustrates the yes answer to the question "Is the predicative- IIadverb-preposition homograph immediately preceded by a preposi-

tion and immediately followed by a modifier 7' Note that this 11sentence contains a homograph (npamo) in the required position

between a preposition (OT)and a modifier (pexypperTrlIX). 11

This test sentence is correctly translated if the homograph is

rendered by an English adverb.

Test Sentence No. 14 on the other hand illustrates a yes

answer to the question "Is predicative-adverb-preposition homo-

graph immediately followed by a governed block?" and a no answer

to the question "Is predicative-adverb-preposition homograph

immediately preceded by a comma ?' This test sentence is cor-

rectly translated if the homograph is rendered by an English

predicate. 11Test Sentence No. 15 differs from Test Sentence No. 14 by

having a yes answer to the question "Is predicative-adverb-

preposition homograph immediately preceded by a comma ?" which

is in turn followed by a yes answer to the question "Is governed

block immediately followed by a comma ?' This sentence is cor-

rectly translated if the homograph is rendered by a preposition.

Each time a change is made in the program, the test sen-

tences are run in order to ascertain the effects of that change.

The resulting vertical listing is then available for analysis. This

is, however, not enough. It does not take into account the possi-

bility that a change which has produced an improvement in one

place in the program may inadvertently result in a change for the

worse elsewhere. IConsequently, each time a change is made in the program,

the test sentences are run, and in addition to printing them out

332

Page 41: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

directly, the information tape of the current run is automatically

I compared to the information tape of the latest previous run of the

same sentences. This comparison is conducted bit by bit. If any

I discrepancies are detected, a vertical listing of the full sentence

is selected for output and each word which displayed a discrepancy

is flagged for the benefit of the analyst.

The inventory of test sentences can be expanded when nec-

essary. The flow chart of the test sentence comparer on the

following page indicates the provisions that have been made for

them.

IIIIIIIIIIIII 33

Page 42: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

0

[1iI

0k

S 4)3

to m

t i_1 0 40

m~r4•m

4'4 ,H :1 4ttt

34

Page 43: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Symbol Pass

1 &-maCTRiIZb upoHHK8aT 'ope: RAPO.

2. BazMHbe~ 7 pONKURT Rep03 RAPO.

I ~~3. Bazbieu 'ISCTIDA Up01K8.T RiOPeS3

4. Heabox sHaTB, UOTOMY 'ITO 5UpOHHKMOT 'isp.: u.po.

5. 'Ia~m~a 2: po~xxaO epe'sp. n.po.

6. EIOCToAHHax 3 [IponazaT mlepe AAPO.

7. 1noHxxaeT 'iepe3 A~po.

8. rnaeHme iaxuaz a~ 2: noHHXxDT 'spe3 AApo.

I 9. C03az~iaxa. %MmTHa XzpoucaeT 'iepes itApo.

10. CoSAaHxoeX npo~axaeT gepeg R~po.

11. 'qC~ nPOHSDOAHT T

I12. flpoR3BOARXY, 'laCTHua npOHXX~aOT 'lepe3 AAPO.

13. 'RacTma mozeT UpOH3BOANTbXT.

I14. IaCTmRUb H- UE~nOKHK8T 'lepes tl o

I Test Sentences for Homograph Resolution Pass: type "TpyAHO," "corxacxow

1. Hamm PS3yAbTaTH ZIPONCXOAJIT OT JIPAMO pexyppOKTHbM MOTOAOD.

2. HBJxeHme flpox3ozzo corsacHO HaUmx pesyatbTaTax.

3. fluzeiiie UpOR3OEZO meZAy npo'mum corsacuo HaREM pe3yJhT8.TaM.

I 4. IHpOX3BOAR pe~yJ~bTaTM corzacHo HaeOmy UpeAUoJLoZOHvm, onbiTb

flpON30MRJ 6e3 TPYAHOCTel.

5. Hq.ue~me npoxa~ouAo RCHo.

6. HIDAO~uI npOU3OR.KO meZAy uponulM AcHo.

7. IRP023BOAR pesylbTaTbi coriaCox, OblTbi ZUpOESOEAN 663 TpyAHOCTSet.

8. HI3aeHxe OMAO coriacuo c H&EHM flpeDUAOSnoao zm.

9. SIDJZHHS corzacHo 6dzzo c HaREM upSAnxIoAZOHoM.

110. SAnze~xe OY.~6T corzacHo c HEmm uentoem.

ii. TPYARo lipeAflozaraTb TaxeS pesaybTaTH.

I12. TpyjAxo Ham apOAUosaraTb Tane6 pegyzbTaTbi.13. IIpeAfozaraTb HMW TPYAHO Taxe. peuyubTaTMl.

14. Taxoe AzxSexx coriacxo HaREM p03yabTaT&M.

35

Page 44: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Homograph Resolution Pasns: type *rpyAo, *oorA&OXo*

15. Tamse onur., cor~Aono .aeuy nPeAMOMORSUM9 uPOSOWm 0em

16. PaOOTa, nPOM0x0AaXaa AO2O~bUO TYYAU0, 0ODOPM0OTOR oSVOAER.17. B@XKMe XONO R01035900?3M p68yabTI$M DpUCKONXA? 005 TpYA3OOYeI.1s. HCos, 'iTO ueAunf.

19- RaNtio AXR MiXaM 12013 NIAbUE

20. Tame. oDwmT 00, noo POXOZOAAY 003 TpyANOTOA*M

21. Taicoe s 85511 8030, a =a3 0DM? npoUUOMA@Y 063 7pyAROIOt'@.22. Taxos ass.... Gespasannuo ?PYARo.23. RSOHo 3UBOCTHMN pe3yibTa? UPOROZOAN? 0em TpyAROOTGR. [24. CoBePmeNOTBsasaUG Taxxx pesyaT&ToD 0'iO~b TPyAUO. 1

Test Sentences for Homograph Resolution Pass: era, a*, zx [1.* P&OOT& npommouaa noose ero o'ie.b samuz yousul.2. flPOAUosozeiiA uX 0'WNb Baww. H3. CoTpyANuiX 00Xp&HRA9 00.

Test Sentences for Homograph Resolution Pass: type *UoToiaauz"

1. flOTo,'oaXwa TPYARS8 paOoa 3553MCTPNPyOT ZEURb.

2. 1K35b E55DOTpapy6T DOOTOXREMZ DR?b paGoT. I3. CReuyuuee 05050 DUNKMWSZK? DpaO00o0py.

4. COaTPYAuX 035835 oAVA"19ee.5. ROAePuK06e TYazy. WaO~y DOEXR? 36Upuiiuei 08

6. flaorOANuua UpKKEM£ST P53350 ON&OuA'it.

7. liUE AOK5SAM 8D5JIDTOR ytieRU~M.8. UaNK AOxaaAm WDoSmma y'1enw3.9.. HMOx pe"aTI UPOBOASES y4*XMI.3

10. Hama pa"aTa UpXDOAM? X y'ieNMh.

36

Page 45: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Homograph Resolution Pass: type *UOyonKMuj

12. Hawn pa~OTa UKDXOAUT OT ZOCYOENKOI.

I ~~13. Mama paO~a XuzneCA xzOowOAzaOI.

Test Sentence. for Nominal- and Prepo sitional- Blocking Pass

1. Bonpoo 0Txpuoro oxua pemaeroa osrojtxa.

2. Boupoc OTipHliIO pSeOTxz pOeaOoJIx corOANE.

3. Aua OTKpWTMX OK~a KftXOAKTOR SASOh.I4. ADS 0TKpMIThI POXOTXN NaXO~ATOa *AO0b,

5. OTXpMITOS X 3aXPUT0S OZKU UXOAX?0R 3AG~b.I6. Hezb3X padOTaTb 603 OTxpbiT0or xa7. Hezabax padOOaTb 008 oTxpbMMX oxxa. a peaevxu.

Test Sentences for Inse rted- Structure Pass

1. KOiie'wO, 'laONXMI apOHRU3T Re~po* BApO.

I2. CerOAHA, 'RaCTxza upo..ia Roepes AO3. qIa0TiMw, KONO'(HO, apoHEKED? mopes LAPO.

4. 'qaOTM1ga, CrOFAH, npouaa mopes xApo.I5. Bodg r'QQpx, tMacTIMN UPOHUK8D? 'lePeSJKP

6. 'IaCThNubz, noodS rosops, flPOHKN~5,T Rlopes XAPOOI 7. '4aoTugbi, xax ace Uva3?, uIpouaXar 'SPGOS XAPO.

8: BaxaM:,g Yaxxw onooOodox, 'Ia0TNIMb UpOHNK&T ReOpo&el

9. BUH~t n au&SOY UPSAUOXOXOXND, 'lao?3U! DPOUKKKS? Mopes XAPO.

I13. iCOH.'wo, 'aOTIaMN 1Dp0EaU oerOARA, '(Opol AO14. KoH9RHo, 'laCTRU~ £IpO~KXa5UT, 5000140 roaopa, Repea RAPOO

I15 * ~IUOTK1~M, KOMOMNO, fPOUNMKS? 'SPOSp 120 3UESOM7 SW&OXAPO.116. Reopeal UO 3&E6Y flPSAUoOAZOXMD, SAWO EYXJIOX5M, XON

3 7

Page 46: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

[1

Test Sentences for Governing-Modifier Pass

1. Bpawmawxeq zcnAu zAyT suOpsA. fl2. KacICHAM, sp&Mwxocx soEpyr xAps, HAYT UopSoA.

3. KaczaAb HymeozoD, ,paswwqeOC soxpyr JAP&, WAyT BUepSA. H4. Bpauawluecx noxpyr xjpa xacza•u z? 3ne0PeA.

5. BpazAvaxecx soxpyr Rpa xacaAw xyzyeoxo0 Kay? nepOA. H6. Bpauammecas noxpyr Rpa WcTpue zaczOmM MAYT UnepeA.

7. BpawanxecA Boxpyr Rpa OMCTp.e EaczaAM HyKaeozoH Ksy? 3T BUSOA.

"XOTOIDm palss-test sentences

1. 1acxmau, XOTOPeO KAYT snepeA, OReHb [email protected].

2. EaSCXbI XOTopbte HAY xnlpeA op4eHb sexxK.

3. XaCXAjl, XoTOpme nnepeA, omemb sOaeRz.

Test Sentences for Main Syntax Pass-Key To Number Code HX. 0. 0. 0. Predicate

0. X. O.O. Subject [O. O. X.O. Object

0. 0. 0. X. Eliminated candidates for subject or object

1.0.0.0. Predicative plural non-past H2. 0.0.0. Predicative followed by Obi

3.0.0.0. Predicative plural past

4.0.0.0. Predicative singular non-past

5.0.0.0. Predicative singular past feminine

6.0.0.0. Predicative singular past masculine

7.0.0.0. Predicative singular past neuter

8.0.0.0. Predicative plural followed by dM•b

9.0.0.0. Predicative plural followed by OuMb followed by

second predicative.

I38 I

Page 47: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Main Syntax Pass-Key to Number Code

[ 10.0.0.0. Predicative is Oui and is followed by second predicative

1i. 0.0.0. Predicative is 6yAyr and is followed by second predicative

1 12.0.0.0. Predicative is a comparative.

13.0.0.0. There is no predicative, but coax followed by infinitive.

14. 0.0.0. There is no predicative, but Gais du followed by infinitive.

15. 0.0.0. Predicative is followed by infinitive

16.0.0.0. No predicative, but gerund

17.0.0.0. No predicative, but gerund followed by infinitive

1 0. 1.0.0. Unambiguous nominative plural

0. 2. 0.0. Genetive singular/nominative-accusative plural ambiguity

I reduced by unambiguous modifier

0.3.0.0. Genetive singular/nominative-accusative plural ambiguity

I reduced by nominative numeral

0.4.0.0. Potential nominative singular, any gender

I0.5.0.0. Potential nominative singular, feminine

0.6.0.0. Potential nominative Mingular, masculine

0.7.0.0. Potential nominative singular, neuter

0.8.0.0. No subject, but predicative can be impersonal0.9.0.0. No subject, predicative can not be impersonal rearrange-

[ sent required

S0.0.1.0. Predicate governs accusative, nominative/accusative

potential object

[ 0.0. 2. 0. Predicate governs genetive, genetive potential object

0.0.3.0. Predicate governs accusative plural potential object

S0.0.4.0. Predicate governs accusative, genetive singular/nominative-

accusative plural potential object, ambiguity resolved by

numeral

0.0.5.0 Predicate governing accusative contains negative particle,

nominative/accusative potential object

39

Page 48: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Teat Sentences for Main Syntax Pass-Key to Number Code

0.0.6.0. Predicate governing accusative contains negative particle, [1genetive potential object

0.0.7.0. Predicate 2overns instrumental, potential instrumental LIobject

0.0.8.0. Predicate governs dative, potential dative object h-0.0.9.0. Predicate governs preposition, prepositional block present

as potential object [j

0.0.0. 1. Nominative/accusative plural candidate for subject Ieliminated because of preceding preposition

0.0.0.2. Genetive singular/nominative-accusative plural candidate ffor subject eliminated because of preceding preposition

0.0.0.3. Genetive singular/nominative-accusative plural candidate Ifor subject eliminated because of preceding modifier

0.0.0.4. Singular candidate for subject of unspecified gender

eliminated because of preceding prepositions

0.0.0.5. Feminine singular candidate for subject eliminated becauseof preceding preposition I

0.0.0.6. Masculine singular candidate for subject eliminated because

of preceding preposition I0.0.0.7. Neuter singular candidate for subject eliminated because

of preceding preposition I0.0.0.8. Nominative/accusative candidate for object eliminated

because of preceding preposition

0.0.0.9. Genetive singular/ nominative- accusative plural candidate

for object eliminated because of preceding modifier I0.0.0. 10. Genetive candidate for object eliminated because of

preceding preposition

0.0.0. 1i. Genetive candidate for object eliminated because of

preceding other nominal

0.0.0.12. Instrumental candidate for object eliminated because ofpreceding preposition

440 1

Page 49: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Main Syntax Paso-Key to Number Code

0. 0. 0.13. Instrumental candidate for object eliminated because of

preceding other nominal

0. 0. 0.14. Dative candidate for object eliminated because of preceding

preposition

[ 0. 0. 0.15. Dative candidate for object eliminated because of preceding

other nominal

I Number followed by I'm 1: nominal block includes modifiers

Test Sentences for Main Syntax Pass

1 ~ToaapxxN yneauaNsaw cns

1.1M.1M.Hanx Toxapxxx yn5exUan 3Yan 0ozzu3aA~nCYK zCK 00ws.

1.1..1 lqepos paAOuuH YOB&pKAR YB.ZU~m3aW? 0033.

1.1.1.lu.qepeS 60.UbEX paloHm Tosapxuu yzeamna3saT 00m3.1 1.1.1.2. 'lepes, coupoYKDzoNaxi ioapNmm ys053113383 0033.

1.1.1.2m. 'lPope dosazuf 00Up0TUNI0338 TOB&PNEN yBOAU'VWMa3[ 00133.1 .1.1 3 *B cuMCOe Taxoro COUPOTHRAORflA ToBapuWu Y6ZEA~n3U3T

1.1*1.8,To~saax~ YBAezuIuaT ROpOS paRou C033.

1 .1 *To~apuxx y3.A~nZa39T 3 cmuCAO Taxoro conpo'Ine5ASIL

1.1.2. Tonaapulu AOCTMVS3? ycNxAU.

11.1.2.10. Toaapuasi AOo~uram 603 TYpAKOCTOfl YOCE~.u1.1.2.11. Toxapawu AooCYNra UOCTaNOBSOR TPYAuoc~ell ycuaufl.

11-3 Tonapanu ysexXRR'iK59.3Y PZ

101 .3.8. TonapEaw ys.3eauusmI nopes paXOX UOTepx.

1 .1 .3.9. Tosapmw. y3CZ3u33s3Y a cmuOO Yrazorc o upomKaxeuE

DOTepa.

1.1.4.Tozapawu YBOCKZKN&V AS* UOTOPX.

41

Page 50: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentence. for Main Syntax Pass

1.*1.5. Tonapumu as 7 3053q3585? c0ma.

1 *1*.6. Tonapmax 30 yUea3uEza.3 coma&.

1 .1-.7. Tosapxuw CXZyZM TeXHUH&iU.1 .1.7.12. Tosapuuiu CxyU? c noupaixol T6XuX~aMN.

1.1.7.13. Tonapau uzyzaT Oei yupasismax 00OaTORUSK TOZeXRKaME i1.1.8. Tonapaum oT3sena COTpyAH3KaM.

1.1.8.14. ToDapauw OT3SOUDT UO COCTaDy CO~pyARNzaM. I1. 8.5 Tosapmau oTs0~awY 603 UOATBOPZAOHNE COPYAHEKXA

COCTaly.1.1.9.Tonapuuw Qo3*eft3T Ha Taxot Doupoc.

1 *2.1 * Taxne COUPOTHBXOHNA ysexm~mzaMT C03B.

1 .2.1 .1. Mqepe3 pafloau TaKN. c0upoYND.IexNE yuexxuzuaaT 0038.1.2.1.2. Moepe3 COnPOTUDXSHKJI Taxie6 flapTR y36eHxqH363T Cos.,

1.2.1.3. B CMMOZO Taxoro COUpOTK3ZOHKXA Tamme napTUE

VBSJLKRNBSJT 0033.

1.3.1. Ana COUpOTXDXSHKA yBsexxINBS3T 0038. I1 .3.1.1. Mpes palMoRb Ana COUPOTEIZOHNA yneimnusawT 0033,

1.3.1.2. MUOpS COUPOTH32SHNZ ADS naPTEN yDexxR35S3T COMB.I

1 .3.1-*3. B CUNCIO Taxoro C0UP0TRDKOZH3A Abe naPTRU ynD0zK~asaT

1.*9. 00MB YBS.ZRRK33T Tonapusim.

2.1.1.Tosapanx cpasuxuazz ON M.ToA-

2.lx~lu.Hann T03apanM cpasumxaax ON may~sul USTOA.I

2.1.1.1. epes nPNM6PH Toaapzuu cpaiumn~aa ON MOTOA.

2.1.1 .2. Mope.s pauaozexzA T05apxx3 cpassusaim ON MOTOA.I2.1.*1.3. B cmaose Taxoro pasaozewux Tonapump cpasmusefzu Gu

KeTOA0

2.2.1. Taxne 00flPOT335O3N UP030EN GM UOTOA.2.2.1* 1 * M~epes usa UPEMOP Y&K O 0p0Y1350335 UPONSROIN GUSTOA.

2.2.1.2. MSopS, COflPOTMBZNHUR Talneus xe npoMOPM UO OEN Gu [email protected] -3. B CEMOZO Taxoro coflpoTDasexxx Teals upsMepu upoOeK I

OM MOTOA.

42

Page 51: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Main Syntax Pass

[2.3. 1. AN& OOSIRO?35ZO33 UPOBOZN OH MOYOA.

2.3.1 .1. MOPeS UPNMOPH Asa OORpOYKMaOMz upossix amM OTOA.[ 2.3.1 .2. Rlp.OPOS pOYUPaOTUBS ADO flOPY3N UPOSDOZN QM OYOA,.2.*3.1 .3.* B CumozO TaoNoF conpo~uzzxeww Ass USpYRN lipoDOex[ Ohl MOTOA.2.9. MeTOA UpODOZZ ON TODOPKUK.

13:01.1 ORR OPaSRbxSAa M:T:A.

4.4.1.4 pe nIpxKeP KOPaHSOT MOKIT 3T OY

14.4.1.4m. RopeS TaZXo IpNMoSp-KOpa RoxaUS&nae KOTOAO

4.8. lIMy Y~aoTCX.

5.5.1.PO Uoe oxaaa.1a MOTOA.3 5.5.1.5.OS Mee OAb MOPe. UOXe&S&Xz MOTOA.5.5.1.5K. Repoin UPOCTYM MOAb MOP&. UoMxau MOTOA.

5.9. floxbuy noxabaz. map&..

6.6.1. COWS noxasax MOTOA.

6.6.1.6. %lope: Upxxep COWS UOKOS&A MOTOR.6.6.1.6m. 'lop.. TAXON UpKKzCW S op o a ux~az MOTOA.6.9. flozbay UOxe~aSz MOTOA.

17.7.1. CpazezDHN noUasaoO MOTOA.7.7.1.7. Mqope. UORBTKO CPSDHOHNO nOUs~oO MSTOA*[ 7.7.1.7m. Mopes T&OXO UOU.ETKO CPU.DNONNO ROUUSaXO MOTOR.7.8. Emy yASAOCh.

[7.9. iloabsy noxagazo opoanzox..

8.1. Toxapaux moran Ouwh.8.9. Soria Ovrb TOue&puIX.

9.1. TosapEm= morim GMb NOMS&e.axu9.9. Soria Oli~h 11UOSILZ TOD&PEU3.[ 10.1. Toxeapuzz Gui. noueaa.5.u10.9 Buzz Roxafaiw -rouepxun.11.1 Tonapumw GyAY noKOsOM.

43

Page 52: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Test Sentences for Main Syntax Pass

11.9. BYAYr UozaSMau ToaapR. 112.1.2. Tonapsus 6Om-p@@ comaa.

12.4.2. SOP& ONOTPe. MOSYOA&..1

12.9. COVS& OUoTpee ?OBap3ZN.

13.0.1. BOAR YBOARRMuaab 00ca.

14.0.1OA RON GuYDOAeau~~ab C033.

15.1.1. Touapxza moryT y3*Zx~n&3OTh 003. [15.8.1. y YA&CTCA YBOZNRU3&?h 00I38.

15.9. Coca moryT yueau~zsa~b Y0D8.pMuN

16.0.1. CoSAaBO.5 COM3. I16.o.l. SaAaoaacb oosAa.DOTh COIDS.

Test Sentences for Cleanup Pass CP-1I

1. BeazxMe T0o.apx~ KMyT.

2. 0'Wxh 5sZNmm T0osapuEH HlT.I

3. He TOUapHN NUYT.4. Bazxue Y0O~apNXK MARXH COTPYARNNON MAYT.

5. ONeub DaSmme Y03apmax uO.3H COTpyANHXKO M"y?.

6. He TODO.PX=M MXNO5H OOTpyAHKIOB HAY?.

7. He 9eZNM T0OMPMAN HAyT.

8. He 38.ZKM TBoaapHx NO.5H OOTPyAKNKOD HXy?.

9. Tooapuix 3O.zNb flpo4meCOpos NO.3HZ COYPYAHEXOD

HAyT.

Test Sentences for Cleanup Pass CP-Z

1. Hao axASTb HOAh3K.

2. flpoc'yW MGAb BhAG~b R0Ab3X.I

3. IIpoo'iyM xe~b Tuzoro po1A& SXe~ Resb~a.

4, Taxoro POAIL UpoO~yM MOAb ZHAS~b 36A035.

5. 0ROua UPOOT7D NOAh BRA*-fb N*Ab*X.6. 11lobay BXA0Th Ne~OXU.I

44

Page 53: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

[[

HARDWARE IMPLICATIONS OF THE RESEARCH[Machine translation research at the RW Division has been im-

[ plemented on general purpose computers. We recognize such hard-

ware requirements as print readers. multi-font printers, and com-

posing machines.

We have recognized the need for larger memories. This

recommendation, however, has been the general observance-the

larger the memory the easier the solution.

In the areas of dictionary search, we have been able to program

I our way around the fact that the general purpose computers that wehave been using for high-speed memories are too small to contain our

I lexicon.

Both programming foresight and cleverness were required for

the general solution of our ordinary dictionary lookup. However, if

one wishes that a far larger memory had been available, it becomes

possible to ask what new attributes other than larger size are de-

sired of this memory.

What strikes one as the flow charts are examined, is that the

need exists for a totally different type of computer. Such a computer

would have an associative memory.

The reasons which first appear for an associative memory are

those based upon the facts that the data is non-numeric. Such in-

[ structions as:

find the word in the dictionary

[ orfind the prefix of the word

[ orcompare the end of the word with a possible list of suffixes

I immediately present themselves. Their uses are real; our experience

in machine translation, however, has led up to the point where those

[ portions of the program which require the ability to recognize alpha-

numerical configurations require a small portion of the total running

[ time.

45

Page 54: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Attention is directed to the figures on the following pages.

These charts are taken from the 1960 RW machine translation report. J]They illustrate a small portion of the syntactic program. Further-

more, these charts represent the thinking of a linguist and had not

as yet been seriously modified by a programmer's restatement based

on efficient usage of a traditional general purpose computer.

Though there is an occasional reference to actual word forms

in these figures, most references are to grammatic attributes which

would have been found only after a successful dictionary lookup.

In these charts, a large number of the boxes begin with such

words as "search, " "hunt, " or by phrases such as "is nominal im- ifmediately preceded by preposition skipping over modifiers and ad-

verbs." Here we have evidence of the need for a search type memory

past the stage where the problem has consumed alphanumeric data.

Among the important aspects revealed by the flow charts is the

need for a search to be made within a region whose boundaries are

defined syntactically. This searching within a non-graphically de-

fined area is one of the dominant features of language data processing;

this is made evident as we examine different portions of our bilingual

programs.

Syntactic Analysis

It is generally recognized by all research centers that automatic

translation cannot be done without performing detailed syntactic anal-

yses of the text. In syntactic analyses searches are made on other

than lexical units. These searches, made upon grammar codes and

resultant syntactic codes, have as part of their nature a con-

junction of conditions within variably defined syntactic boundaries. _

Up to now, the prominent methods of analysis have necessarily re-

verted to standard computer techniques. Yet, as indicated in the

following figures, this portion of the analysis, at least in the Fulcrum ITechnique, is heavily loaded with "search" and "hunt" procedures. In

this area of the translation procedure, one would expect to find heavily Iused associative memory type algorithms and a different word size and

I46 II

Page 55: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

'4II[IIIII II i�iiiI it '�

I

I it il� Ii![IIII I. II'' . a

[IiIi

Page 56: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

PAIPIC% p me ps w ft" OR n

PA.0.0, oP MOM G?*0 oaKPNo101

PA.mo.isa . ONelalPAL" f a

Nam$A ftltN. COONan

IW TImeR A

mosirsem, ~ PANami eI

GI P ASASSNOa 00 S I

SINGULAR PA 3sPetmItI

PAPOA IIe p. at4 Cmx, t$ ap. LOCea 10at I

NmN4L0 Seall"

*IconIs O

Page 57: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

memory module size indicated from that of the earlier portion@ of

[ the program.

[ Semantics

The portion of all machine translation programs which have

I been most impervious to solution is that represented by correctly

erasing all but one of the possible English translations for a given

foreign word. Some groups have even been dismayed enough to

indicate that human intervention will always have to be required at

this point. Our own research has led us to the use of techniques

which with hindsight would appear to have called for a cascaded

search. Among the problem types are homograph resolution,

idiom identification, word combination recognition, and determiner-

dete rminee paring.

Idiom recognition requires the knowledge that two or more

words, when closely connected syntactically, possibly even con-

I tiguous, have a quite different meaning and possibly a different

grammatic aspect than would be expected from the sum of the

parts of the idiom. Here techniques that have been used include

the dictionary labelling of each member of the idiom as being a

potential idiom and then, when a conjunction of possible members

I occurs, searching in a special idiom table. Even the necessary

identification of the idiom potentiality requires search procedures.

The interaction between idiom identification and associative mem-

ory computer configurations should be studied.

Homograph Resolution

I Many words share the same spelling. A dramatic example in

English are the three different meanings of the spelling "fast. "1 As

[ a verb, it means to not eat; as an adjective, it contains the two

opposite meanings of tied tight and rapid. Resolution of this type of

ambiguity is often made by searching in the neighborhood of the word

for indications of the word class of the homograph. Such resolution

would be more easily effectuated with a search memory.

[[ "9

Page 58: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Word Combinations

Word combinations represent an idiom type whose configuration

is both lexical and syntactic; that is, the correct translation of a

primary word or of associated prepositions is determined by the 1existence of the primary word and a particular type of syntactic en-

vironment. This environment can be represented, for example, by Ha word in the same clause with governed genitive and dative depend-

encies, but with an explicitly missing accusative object. Since indi-

cations of the limits of this environment are only vaguely fixed,

search is again called for.

Determiner -Determinee IMore than any other type of semantic resolution described

thus far, determiner- determinee resolution requires associative

memory techniques as can be seen in the flow charts for the method.

Here the resolution problem is represented by the existence withinwide syntactic boundaries of two or more semantically connected Iwords. The "hunt" initially consists of ascertaining whether a

determinee exists. Since determinees are capable of having dif- Iferent determiners, the next aspect of the search directs one to

"hunting" first the identification of the possible determiners, next !

the establishment of their coexistence and then, finally, verification

of whether they exist in proper syntactic relationship to the deter-

minee. Determiner-determinee resolution has been awkward to

program on a standard computer. Its resolution should be far more

powerful when associative memories are used.

OutputI

A consumer of automatic translation thinks of a printed page

as being the output. This is only partially true. In truth, the printed 3translation is the fruit of the process. The necessary evolutionary

improvements receive their sustenance from the research output. I

Since nature of handling the research output almost invariably calls

I50 1

Page 59: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Ifor the identification of particular translation configurations in a

large body of text, this most necessary area of automatic translation

would seem to benefit greatly from any hardware development which

eases the burdens of search. Here one would study the proper way

of emitting the research output so that it would be best married to

the coming hardware realities.

Dictionary Lookup

The problem of dictionary lookup in RW machine translation

has two major aspects. Most Russian words found in the text that we

process actually exist in our dictionary. A sizeable percentage of

the words, however, are represented in the dictionary with a different

I ending. By stripping off the possible suffixes of text words which are

not found directly by lookup and by then searching on stems, the

I semantic and morphological aspects of these words can be derived.

The use of a search type memory in this portion of the problem is

I obvious.

IIIIIIII

Page 60: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

DICTIONARY UPDATING

Our initial machine glossary has been updated in eight separate

runs. This added 6,909 new forms to the dictionary, which now con-.

tains 23,713 forms. Since our stem-affixing procedures allow us to

grammar-code new forms from different forms of the same stem

found in the dictionary, this will give us a coverage of about 50,000 11Russian forms.

52

Page 61: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I

III

I SAMPLE TRANSLATION

I!I!IIIII

Page 62: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

W "dW

30.-

at Ndo I- I -

W 4;r ow IUA

Ox 9 W 19ILU. #A 1- 09. z r0 0 o tw W-U

W U a t 0w CU z

iL of WU

U..a. 0 -

W 0U 0 91V4 9 14 94 w #I.i

'a 0 U.

U. W X-

UW i a I

4 0 w 4 94. IL. 30 - VI 0L

I. W W

-1 w

190 W9 9 W

us I.-

54 P

Page 63: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Nw0 0u z I

U'

w I- ,-4

IL ~ ~ V -1 w 1--4 U .

d4 * I" Li4III U W6 i 1.- 49 ua4 0 xJ ow *I jU

x~a at I-IU

24 0 0- -9 Ia iLU gg x U ki 0 Z digj

S., . LU J- - LUL

U LU w I- 4A gow jaI

,. 2 2a OO IL 4. 0

I.LLLU LUw 0 K e"0 z-

I. 49 LU 3D I-$- 44 x LU -1 4b o

ft * z I.0- 2 &L La%A wI L 0 1.- V j 0 -

o 0i LUw 0 LU VIAU W fLU1 ow a 4 1.- 1.- x 0

S. Z U 0. .J V 4jm ac UU z acf %A C; &A JU

I- 0 4 9 z w0 ae ac 0LU0 z. LU Z 0 LU 96LU 0

0 U dc 4 0 EW - 4

&4E Z I A 4 w.4 LU ( LU IA 0-

a .9 2 0 U w t- - 0-LU K0 . 4 o U6 Z 2.- IA 0

xLU U 0 a 0 LU LU w

I- P-- C- 2 d0gi4 VLU.J -4 4 V - 3 I

woo LUwL LU -9 4U0 Li I- 1.- 0 LU 4 0 4

VI 0 C-1 $, V -

0 nI at 0U,

I-- -x-w 33 0 %A J

'.0 0 9 x 0 V

22 LU I. LU LU

oi at 0 U 0 VI0'L LUU

in CA I- 0 4 z9 LU 0- 1. c 0 tie L

IL Ia U. 4 Ui o-Z . ofLU 0 A 0 0 4 2

0- 1.- 0 'J 0 W04 VI V . VA of 0

g. a. I 0 LU I-K * iLU 49 4 2 x 1. 0 LU

ug 0 - ado U. 6- z C1)0

a 4 0 E Z &L W 0

0 " 1- 20- -00.ot I 0- I.LU

MU 4 I T K 6m~SU. _

Page 64: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

r. 4A e

S KO 0 W wn#A vi m W 1- .~

0U IA 2 C 4A j2 03 [1 a0 oZ.4 20 z

at wi m

a. w

-0 t-

4 I0 $

at4 0 49 z E2I-4 w 4A - 4

-I K . . J 1

2 ~~~ Li4CLU0 4 L

0U ..j 0 . . -4Ap- cc u I

In 0 I A. - -LUJ 4j 2. - tInu

2 -1 A InI.LaI. U LU 4 U,

cc8 In4 .4 - i44 0 ui 0U Vz CL~ I-- 0 4 -

2 t 0 of. Z z.4 L

x 0U 1- On 4 L 4%A . IA LU w LU LI 4

-C 2j MnU C K

05 L2 0 LuJ z 4t% I= i0i Cc -LI U In

4 0W cc I- Cie In0. 00 -9 In jn - . In

0 IL 3 0-o u. u. i

.j 03 Ll 0 4A J ofJ 0In o 4 -9 0 nat. u 0 0i LI94 LI 0U 0 0 cc cc

CL CL. %A of % 3 LU

c i.w 2 -: cc b-

xn 4c- 2. I-.U. Vw .OA I- 0 C 1.-I

0 0 0 of L 4 I&A Uj 0 z

go -C 49 10 u 0 @. I 1 m 4

19K LJ W "z )0

56 10 A- oo I A L

Page 65: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

IL # -

4A U.Z 0 w

w x t a 11 0LUU L tW AI

0- 7 LU

I Km ~ . If 0 .~.0 -"LULU zjv w

LU If~3 .LU 0.. x LIj b.- OW Ij %A

z of .IL LL 0 0- LU

.j 1. -0.z0.I ZL L

Wi 4 of .. > uC.) 0 aul

oa.-e- ~ L m . am 0L00i-. -LU x x0 uu

0 - z x~-0

I.- 2ce..

-j xU 0 0U "A

I- x z C z 0 Ci

0 1 0- A 0N- &0 0uI. of 0 1.0- CL

ID> 0 UJ

Au z 0 1- U.zb- ecJ cc . U. -9

0 CL 0 cca~ vi,.U Lu 0 0 0e w

0- 90 x 40 0- -C IL0 0U Lu C. I 0

&.m- Wa o z #A~~~~ 0~ ? -L

Wy 0 0. u Udo oo w LUL

LU 0 ItJ U..

at ag 0 - 0zu 0- U. 0 m

-9 LU 0 0 1 Loo x w0-

0-3.C ) 414 LU

-4 U. w 5e

Page 66: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

IA W Af

* a a-u

tu 0- U. W L-)

1. a~ V 04.0 0 z

Uo WE b- P.

'A 'A 0 u w - 0a. I. 02 I %

2 WU. IL 1-

'A 1- 1.- 44 B- 0

*ZZ 'A 0.'AU U. w 'A 0. Uj

j 3p co- e U -a

.j di )I.-

W6* -a A. P .- a i 0. 1.-

01 9 0. CS *2

o C; 22 U IZ 0

3 * 49..a 0. I-4 A9W('A 2 0 A- . -I

.j 5- 44 x W-

U~ )p w... 2 . .j )U

2l 'U IA Ci j C

o s 'Ai -mo b LU Of I

.J 5- A 0 404 2

4 .1 'U 'U .. 5B'U Is- 0a A. z 2

I~~ 08 2 A. 'J ILU 0.- w z

0- Cie- 2 4 'A.~II~~ 4L a U C 'UC

Q 0 x x 0hUi 'A C LA I- U%) -6- 'U 0. 1- B-

oe 31C- 00 zUW to I.- W C u.5

U UU 0 U.U. ~ ~ ~ ~ U I - . A-I

'A 0 0 wU -z %U z Cz 2j w #U A 0

u , 09 9 U A .. aU

5- IjI-4of W 0. 0 1U..'A

1 U 1- 1.. %Ain 'Ax 'AU %A 0 at W

x ' 0 -8 a. -j 6- -

Ul I. uj ; 0 01C j U UU 5. 2.

4 . 'A U k'

U 0 )- I.-

4 5- U 0

Cw -5- ~ 2

Page 67: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

ww

t' 0 mi J

IL a fT l 'K 0.

49 3 A . inCI- IL im..

w. n *w -W C9 0 S.S. . kA CL.. .5

in~ X wz!uz

LU 0. q w U

>n .0 cc j. j 0

W. W) >. I- 4 4A of

2U 0 w A 2 Uaf u or. U. of K- 0%L 4 .2 oxcc

LL Ui .1 ). . 04LU ix ax(L u -9 0 L

.j n ,jA 0b r 4 L

b. I.. w un .ui I-

49 LU L 0 zU 4 I

U x 0 U.U

W) I. U 0. IU4 Uj LU of z .J 0

10C mn 3 L0 I- I. w i-

0 7 J 0K .i u2 I.-0 LU.

4A I- UU at4 LU m in-j Xn b- N.L

ee inJ 0 .z-

4i U) V) LL Z.

4 x 0 of L

-c U 0 0.L

6K 0. K 7

4 I.-Z LU t2 0

0 CA - .U &AO . kI- 0 .JU I- 'I p.-

49 LU cc I.- 0 i

6 - 9.- CL in i

*i u 0 LU wi aoA Vn of LU: -4 in0> 4 LU 3. 0 KU

LU1 L

ot 0 4f -iULU in V) U U

o0 V -0 LLU M0 L

0 -9 LLA Ut L

4 -in p- U )0.0 i.U VLA 0

4~a K I-i ~ ~ L

CD LU L-L 0 *

6- U

) at 0 U z~LIL %L UA & K 6

a. U. 2 ina. U 359

Page 68: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I.A 30=L u i r $ a

z 1.. uJ -j ctwG3 2 -&#0 U. al Sb0S h 44

us 7 'A U. *5 b W

e4t"s 129 11u 0 .- u w3

of U. a-

'A 'AA. .- W

ee %A A w-4o C :; W WO zA

6 A Lal :P IL . b 20W 0 .. j C IA 0 W 4 I

-j 0 Wi 0 ccC 00

'A UU U) W 3

vi U0 %A U. .- C

'A IC ' #A cc4 2.

-. 4 W W .

.IA 0 atWIiw 0 a 0. us

4A 3 'A aC

2 bw 'A of 2

OC 6 - 6" QJ m A 00 U. 70 GC'A 20 C C0S cc WO -4

L4 'A.- t I La AIL 0.- z - lo.

0 w' x Ln VIA' -0 C

LI 9-A il

ill- z j'- 00 U.

x we- z

of U U.C - S

0U 44 . 5 '

5- A 99 0.Wj w A I z 4LA

z z -

I- U6 w Sb0

WW 5- A I 5 4 '

22 x W Im x- s - Z

00 Ui Z ag #A z vi1 4 OAW U. CL 00 .jW

ec 'AU 'A I. IL S. j3 IA z WE OR 4 -. W w

9- 9- U4 %'ZI W) 'A. CU 5p- at of W

0~~~95 A4S0i. AZ#A - Q 'LU WW w

#A z t4 I-4 I UCe. S 0 . - ''

02 IL 0UA 4R vb.J

'A 9-S 2W 4

og~0 U A .iw

'A A

60

Page 69: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

TA 1. 0 .

Nu z

IL 5 - !!

9 L a of ' . be

xU Li Wi 2 I&in U U %A L4 U.

MC 0 1- aWoIl

49 oe 4A U.

0 U LL U 0

0 U. x-t: b- z L -U CU 2

U. tA u.5 o IL doC) B 0 x %A

=1 ui b1 U W

0n 4 Ga :0 0 2 W

0 g.. w = B- B

6- w 4W j

U. LL 0g u- > 2 e

w a w xa -

cz 0 4 z4 2In 0 . -6 uj 0 '

1.-0 0 W5 z - cc

2 In ..J -U 0

0 Z 0 0 #A IA

* m4 -ide U 0 In toI

0- x 04A A w

LU w

z 31. UUi IA -U

0 0i of 4 01- U

w n 09 0 0. IA 0 In

Uj 0 In w -4 vi- 1

U. ao z -of *0. cc 4 z -W4

LU 1 ZL) O 0 0U

4A Lo- 0 x I4 ~ v 4r 39-~ -B- ~ w

%A x B-Z i z LA

uow0 0 j x

a LU I.. I- x x Cie

- ui 0 #A of I- B

wI- I. LU d .. 1 zU

oe 9 B- Ib- I. to,

I. B- &M - 960 -J %AI

U 49Z-:pu- 4c 31. .j 30 0

mi W IL ULU..I .1 L

* B- ~~~U. - zi u B

61

Page 70: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

a. wo u a xn I-IOU 66 6- 3.

0.. iny I")0 U.V~ I

uj ~ ~ w .8 0 2 2

J 14.8 0 ~.

5- ac x' 44 4I- - I- ..

Ul z 4C LU i L4 T' z I .J LU 0

U. a 3 "- a. z z o9 1- 4A MJ c Iw - 1.- u4) W AC 4A U.U

Wa U. X d8 -9 4A0

2 :: LL L 0 K UJ 5- I .- 4 0; 4:i u ac I

in 4 - z 4930In w u .-4 LU 2I- me4 A I- 3

w 2 LU -4 z 4I. - a a J u

4A- L Z LU 4A j-. 44 .J w nU

0- 1. Il Z uj U; UwU 1; 0: J 0 LU 0-x z z .4 .j a6-. -j 4 4 x

KU Z 4 %. 0 of a:a0 IL 4 a 0 LUz U. LU Wux U. ~ s

S W .4 - - 5- ~ I of W L KLU L- 9-K K nz z z - LU S. L 0

CL 4 C :) = a.j x LU * i 0

4 4 V in 0 Z2 K

"! uS ag0-- $. L w a. ws

4 ~ ~~ ~~ z8 x uu a 4c'l .- i ~ .

5- LU CL - IA 2 tU

IA A X- LU I- Lo20- -j .8 I- 0

0U LU J - 0 U

5- 2 I.- 'i - -

.8 4 -0~ 0. - - L

44% 0 4.U. 44. al u u.

tu J- CU ).C

t. 0 4j L. 0- z co

2Y in U. 0 K3' a % -o

Page 71: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

S 0 0 1

& z ; . aIj w de on 00- x U 0 0..U 4j 10 19It z 0w

0. 2 .. e I.- 0i 0

_ z .Z a. aI.- ZW 1L4A :1 02 0

U, cJ w.. 2 JmI-.~~ CIO...O J 0

1o IL ae J ~l u C tLLU 00 AA w

-j clo~s ( 0 1-JLi 6W LL

49 x. w LU C40 I- (. 0 x 00

zU ag 0

44 Z 0 0U L

S4 3T~ 0 Ltu LU m - O0x w -

0 a. 0 U

0 U) 02

I.- M 0U t3 0a

1. IAL) 0II-. ( A 1.-2

~m W .J . 0

IA u . -ii Lu *0lI.- W 0o

0 ia IA a. IA oLU z -j a 0 cc0 CLa

-9 44 LU 0. %- - 4A 0Li 4A M 00 LU CL

J LU - 0 xU aL- 4A uj

CL0 C.Li U. - U X 0 cc

LU IA 0 U. 0 C-

x: 0LU I- z-c- U

- 0 mA = 0 0. -2t LA 1.- . 0 0A 0

.- 0L 0j0t

fl 0. W A1- Lc -j 0

CLU 25 W.A1- U 0 7a a. 2:- e w LU 1- 0 0 0

r- 0. -f0z 00- ui. I4 % 0 Ia,'Q) x LU 0 -.- JO L 0 S I

7 2 U, f- LD 4A 0 0.

a. LU 0u~ ~ u w r U0 a - U 7 *0

-4 N Uj 0 2! C LAC A 0I. -2 LU 0 LU

UJ ZK. m.

.U LU 1J. t. IA l'ý U.IA 10 -x -1 0

>. u. mw z t0 1P.A~; 0 QAU 3 eI

63

Page 72: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

II

FLOW CHARTS OF IDETERMINER-DETERMINEE METHOD II

HI

IIIi

64II

Page 73: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

AIC

StorIe zeros in AMN and

Save indexes, store zer-N os in a block of 100 ad-

eesses

Bump address 1(to next word)->Test out of sentence yes )I

Ino

Is the current word adeterminee (is "B" thesecond character of thesemantic code)? no PER

Store the address con-taining the word numberof determinee in nextempty address of 100word block

Is block filled? ___

I no

I Is urrent word a deter-miner (is "C" the thirdcharacter of the seman-tic code word)? no

IyesiIs "AWNC" filledt ' yes- ;

Ino

Store the address con-taining the word numberof determiner, in AWIC

* Not yet written 65

Page 74: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

g

PAWNC PEM

Stor•e the address con- Pick up next entry intaining the word number 100 word block and pick 11of d terminer in AWNCl up word number referred

Iv to thereP PE Is tie word number less

than the word number ofthe first entry in thedeterminee tablet -,)

Is tke word number thesame as the first entry

Sin the determinee table? x-M+ FU_ ~ noIs first address of 100 Is te word number moreword block containing than last entry in the yesaddresses of word num- deterjinee table? ____

bers of determinees tnofilled? no Is te word number the

ýyes no same as the last entryIs AWWC filledt no ENT* in the determineeye

,yes tabl, yes _Are the contents of ___nofirst address of 100 PCLNword block and AWW noequalt

AyesIs the second addressof the 100 word block yes )

Ifilledt$no yes

Is AWNOl filled? - PBS I- JInoENT*

IIII

* Not yet written 66

Page 75: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

PCIN

1d -together the add-reuses of the top andbottom parameters ofthe table and divideby 2 (gives mid-pointof table)

Is tie contents of thismid-point of the tablemore than the word num-ber of the current de- Yesterminee? PLRT -midpoint of the

t•no table the new bottomIs e contents of the parametermidpoint equal to thevord number of the cur- a PCLNrent, determinee? yes P -

Made the midpoint ofthe table the new topParameter

Do tA top parameterand the bottom paramet-er of the table differ yesby 17 no___T4no

* PCLN

I Save~address of match

PRPM+2

Save Iaddres of match

* Not yet written 67

Page 76: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

Savel'address of match 1PMP•• t the determiner

table** to the addressindicated in the deter-mince table

Pick up tests. Ascer- [Itain what test applies

Doeskcomputer word ex-smined have an indica-tion that it containstests? no N (error in table) U

,yesDoes code indicate amodifier determined bya following nominal? e---PFN

_ _ _no

PNPl sDocs code indicate anominal determined by yeIa following nominal? Pe_

F_ WDoes code indicate apreposition determined yeby a following nominal? - .P

_ _ _ noDoes code indicate anominal determined by esa preceding modifier? yesPPH

__no

5 ?IDoes code indicate agenitive nominal deter-mined by a preceding yesnomi pal? PN

- -noMNT*I

SNot yet written

*See page 5 for the format of the determiner table

68

Page 77: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

IIII

Is rd following deter- yesminee out of sentence? __InoIs determinee a modifier? - ___

yes

in following word an ad-yeverb or modifier? - Bump address

jino

to next word

Is following word a nm- no

,es

Save word number of fol-lowing nominal as poten-

I tial determiner

finished

IIIII

69

Page 78: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

G

0

is tLa following cur-rent deeine out ofsentence (ski a dverbe

m id mo d i f i e r s )

fIs Ir folidving cur-rent determines a nom- I

inal (skis? adverbs and'e nomodifiers)?

Does it have any genl no

tive agreement bit., L 11

eSave rd nmber of vord 11following determinee aspotential determiner

finishedVF

found

IIIJ

70

Page 79: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

PWE+l - - Pick up next determinerword number in determin-er table

Hav we passed the lastdeterminer in the deter-miner table for the de-terminee on which we are yesworking? e Finished exit

Is toword number ofthe potential determinerthe same as the wordnumber for this entry inthe determiner table?

jyes

Is the test fulfilled bythis determiner equal to ,the one which initiated nothe search? no • Finished exit

IyesPlace flags in one ormore of 5 consecutivelocations indicatingwhich translation ortranslations apply

j Foun exit

7II

71I

Page 80: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

ME'

Seareh for first non-blank character inEnglish field

Is ttis a character ofa translation we want(first translation,etc.--check flags set noin Piii)t

ME15+2 VMove to next character

Ha ihe 6th computerword of English field (e"been reached? . roces last e__4 no processes last

coputer word of Englishfieed)

4Is next character a noplus?

another trams-lation has been reached

INSE

Mala out character 3

72 I

Page 81: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

I

I. CONCLUSIONS AND RECOMMENDATIONS

Semantics

The determiner-determinee method investigated under this

contract yields results for resolving some classes of English

equivalent ambiguity. The research procedure used pursuing this

study will be fruitful in providing rules for choosing semantically

appropriate English equivalents for a large list of Russian words.

It is recommended that this technique be further refined and

applied to a significant body of fresh text.

Syntax

I The ability to rearrange translations from Russian word

order into idiomatic English word order is dependent upon the

j correct determination of clause boundaries. Research done under

the present contract in the identification of subject and object

I packages in conjunction with other clause boundary indicators

allowed an improved rearrangement procedure. It is recommended

that continued study be applied to the interrelated problems of the

I resolution of clause boundaries, the identification of the functional

attributes of major syntactic units, and relations between clauses

I within the sentence.

[ Diagnostic Procedure

The growing complexity of the machine translation programs

and the recognition of the fact that they are constantly being changed

for experimental purposes have highlighted the need for diagnostic

procedures. Systematic diagnostic procedures similar to those

developed for computer checkout can be applied to machine translation

programs. Such a procedure can help in isolating the causes of

Sdeterioration of previously trusted areas when a change is applied

to a supposedly unrelated portion of the program. A computer

j" program was developed under this contract to perform this diag-

nostic function.

I1 73

Page 82: I 591113 FULCRUM TECHNIQUES TO LANGUABE ANALYSIS · TRW COMPUTER DIVISION C169-3U5 AF30(602) -2643 Project No. 4599 Task No. 459901 I [[r [IIPrepared for the Information Processing

computer Configuration

Work in the semantic area of this contract highlighted the

desirability of using a search type memory throughout the

machine translation process.

74I