Upload
dohanh
View
228
Download
0
Embed Size (px)
Citation preview
1
r A AD-A009 939
1 SYNTAX, SEMANTICS, AND SPEECH
William M. Woods
1 Bolt Beranek and Newman, Incorporater
L )
•
Prepared for:
Advanced Research Projects Agency
April 1975
DISTRIBUTED BY:
KJiri National Technical Information Service U. S. DEPARTMENT OF COMMERCE
Hi^. itülFiri-tf-
Unclassified Security Classification 4jiJjl0£3£3i
DOCUMENT CONTROL DATA -R&D (Sieurlty clanlllcallon ol llllt, bvdy of mbilrmcl mnd lndmr,nt unnolmll-m mu»t b» agjggj whtt tfi> ofrmll npoit lu clmtmlilad)
I. ORIGINATINS ACTIVtTV (Corpotmlm muthot)
Bolt Beranek and Newman Inc. 50 Moulton Street Cambridge, MA 02138
la. REPORT ICCURITY CLASSIPICATION
none ab. CROUP
J REPORT TITLE
SYNTAX, SEMANTICS, AND SPEECH
4. DESCRIPTIVE NOTES (Typ» ol rmpotl and Inclutlv, dmf)
Technical Report » *UTHOR(*l rnr«"uwM, mld<n» Inlllal, laml ntw0)
William A. Woods
S. REPORT DATE
April 1975 M. CONTRACT OR GRANT NO.
N00014-75-C-0533 6. PROJECT NO.
«. Order No. 2904
* Program Code No. 5D30
7«. TOTAL NO. OP PACES
57 76. NO. OP REPS
42 Sa. ORICINATON-S REPORT NUfcTBERI»
BBN Report No. 3067 A.I. Number 27
•6. OTHER REPORT NOW (Anr othar ihla »port)
10. DISTRIBUTION STATEMENT
Distribution of this document is unlimited. It may be released to the Clearinghouse, Department of Commerce for sale to the general public.
I X. SPONSORING MILITARY ACTIVITY
ONR Department of the Navy Arlington, Virginia 22217
If. SUPPLEMENT«1
Rsproduced by
NATIONAL TECHNICAL INFORMATION SERVICE
US D.pirtmenl ol Comm»rc« Sprmalieid. VA. 22151
IS. ABSTRACT
Recently, speech understanding research has taken a direction which recognizes the importance of syntactic and semantic constraints as an essential part of the process which deciphers speech signals into sequences of sounds (see Newell et al. 1973). Consequently, it has become important for speech researchers to be acquainted with the work that has been done in the area of computational linguistics, attempting to construct computer programs to model the process of natural language understanding This paper attempts to provide an introduction to the techniques and results which have come out of work in computational linguistics which have special relevance to the design of speech understanding systems. The paper was written for an audience with some understanding of the nature of speech signals and the difficulties of per- forming an acoustic and phonetic analysis of such signals but with little familiarity with the techniques for parsing and semantic interpretation of natural language or the ways in which such techniques could be used in a total speech understanding system. However, readers with interests in computational linguistics, linguistics, and artific'al intelligence may also find the paper of interest.
This paper is not intended to be a survey. Rather, in it I will try to trace the development of what I think are several important ideas and trends in parsing ana syntax and in semantic interpretation. I »fill attempt to convey a feeling for what I think the state of the art is, how it develop3d conceptually, and some of the new perspectives that the problems of speech understanding place on the processes of parsing and semantic interpretation.
DD i MO« as 1473 • ■PLACES DO POISM 1*7S. I JAN »4. NMICH IS «■•OLUTK worn aiMV us«. • Unclassified
Sacurlty ClaaaincaUon
Unclassified Security Ctassiffcotion
KEY HORO« ROLE WT
Grammars
Parsing
Parsing Algorithms
Semantic Interpretation
Semantic Networks
Semantics
Speech
Speech Recognition
Speech Understanding
Syntax
ICÜ
Security CUtslflcation
,
□
BBN Report No. 3067 A.I. Report No. 27
SYNTAX, SEMANTICS, AND SPEECH*
William A. Woods
April 197 5
Sponsored by Advanced Research Projects Agency
ARPA Order No. 2904
This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by ONR under Contract No. N00014-75-C-0533.
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Advanced Research Projects Agency or the U.S. Government.
*To appear in: Speech Recognition; invited papers presented at the IEEE Symposium, D.R. Reddy (ed.), Academic Press (1975).
\h \
i: LJ r- I.
I. 0 i
bbN Heport No. 3067 bolt Beranek and Newman Ine
S. mm
he directi semanti whien d weweii l'or sp nas Dee attempt process attempt results linguis design written nature an aco little semanti whicn s underst coraputa intelli tne rea speech I sugge Jakobso introdu w i t n o u t
cently on wh c cons e c i p n e et al . eecn n cone ing t Of na to whic
tics ol sp
lor ol spe ust ic I'amill c int u c n t e anding tional gence der wi proauc st the n , ctions sucn
, sp icn trai rs s
19 rese in
o c tura prov n h wnic eecn an
ecn and
arit erpr chni
sy 11
may th 1 t ion pap
Fant
prio
eech reco
nts a peech
73). a r c n e tne
onstr 1 Ian ide ave n 1
una auai
signa pnon
y wi etati ques stem . nguis
als ittle ana
ers o a
This r kne
INTHUDUCTIÜN
understanding research nas taken a gnizes the importance ol syntactic and s an essential part of tne process signals into sequences of sounds (see Consequently, it has become important
rs to be acquainted with tne work that area of computational linguistics,
uct computer programs to model the guage understanding. ihis paper will an introduction to the techniques and come out ol worK in computational think have special relevance to the
erstanaing systems. The paper was ence with some understanaing of the Is and the difficulties of performing etic analysis of sucn signals but with th tne teenniques lor parsing and on of natural language or tne ways in could be usea in a total speech
however, readers with interests in tics, linguistics, ana artificial o lind things ol interest herein for or no background in the nature of
tne characteristics ol speech signaxs, y uenes anu Pinson (1963) and by nd Halle (1967) as appropriate paper snould be readable nowever
wleage ol speecn cnaracteristics.
Tnis paper is not intendea to be a survey. natner, in it 1 will try to trace tne Development of wnat 1 tnink are several important iaeas ana trends in parsing and syntax and in semantic interpretation. I will attempt to convey a feeling lor wnat I tnintc tne state of tne art is, how it developed conceptually, ana some of tne new perspectives tnat tne problems of speecn unaerstanaing place on tne processes of parsing ana semantic interpretation.
bbN he port wo. 3057 boit beranek and Newman Inc
Part 1. o^ntactic Analysis
Ihere are two parts to the proulew ol syntactic analysis -- one is a component oi' judgment or decision (wnethur a given string ol words is a sentence or not) and tne otner is a component ol representatj-on or interpretation '.deciding what tne pieces ol' tne sentence arc and n^w tney relate to eacn otner). In speecn understanding we will see tn?t botn ol tnese are important.
Let me start witn a mini-nistory descriPing wnat 1 thin« tne current state ol tne art is, now it developed conceptually, and some ol tne new perspectives tnat the problems oi speecn understanding place on tne evaluation ol' parsing tecnniques.
r1 nrase structure Grammars
The field o1 linguistics was given a great stimulus wnen tne two aspects cl syntax (judgmental and structural) were combined in tne lormaiisr,, ol pnrase structure grammar. Prior to tnis development, largely due to Chomsky (e.g. Cnomsicy, 1965), the mechanism wnercDy a computer program could decide wnetner a given sequence oi worus war a grammatical sentence or not would nave been difficult to imagine.
Tne principal component ol a phrase s-.^cture grammar is a collection ol "rewrite rales" sucn as tr.e following:
is - > ut1 v f Pi P - > Ut'i it
\1 r - > \i H P
Intuitively, tne first rule indicates tnat a sentence can consist ol a noun pnrase followed by a vero pnrase. permally, it indicates tnat in tne course ol deriving a sentence, one can replace an occurrence of tne symool ä in tne string aeriveu so far, witn tne sequence ol two symbols hF Vr. Similarly, one can replace the up witn tne sequence uci N ano the Vf witn tne sequence V i^r1, ultimately deriving t. ne sequence Utl w V üLI N, wnicn is tne sequence oi syntactic woro categories underlying a sentence sucn as
xhe man bit tne uoc
u
I
I
r r i
i
i
bbN heport wo. 3067 bolt beranek anü Newman Inc.
Parsers ana hecognizers
The rewrite rules ol a phrase structure grammar can be usea to characterize tne set ot possible sequences ot words which can be considereu grammatical sentences, thereby lormally representing tne judgmental part ot syntax. A formal algorithm tor taking a grammar and deciding whether a sequence ol words is a sentence witn respect to tnat grammar is called an acceptor or a recogn.zer.
11 in the course ol aeriving a sentence accoraing to the rules we keep track ci whicn symbols were rewritten into whicn sequences, one can construct a tree structure sucn as that represented in ligure 1 whion gives a very nice representation ol wnat tne parts ol the sentence are and hew tney are put togetner, thus achieving a structural representation ot tne sentence. An algorithm for constructing such a representation wnile accepting or recognizing a sentence is caiieo a parser.
NP
DET N
THE MAN
VP
\ NP
/ \ DET N
BIT
THE DOG
figure 1: A jamoie enrase structure iree
Lexical categories ano aictlonaries
wotice that in ligure 1 ano in tne grammar rules there are two diiferent kincs ol names oi' noces; tnere are
■nonterminal" symbols HKC S, eif, anc V f, wnicn name wnole pnrase typeo, anu tnere are otner symbols which are essentially lexical woro class names, like determiner, noun, ana vero. Tnis distinction uetween terminal and nonterminal symoois is tormalized by dividing tne vocaoulary of special
obi« tteport wo. jObl bolt beranek eno N&wman Inc
sy loois ot a phrase structure grammar into a terminal ana nonterminal vocabulary. The initial syinool ü, ana all ol
the symDols whicn later get rewritten by pnraae structure rules are in tne nonterminal vocabulary. The uerivation ol
a sentence stops wnen tie spring consists entirely ol terminal symbols. In a simple view of phrase structure
grammars, tne terminal symbol; would be the tnglisn worus themselves, but this woula result in a huge set ol
"singleton" rules sucn as:
1>LT > the
Cun tne average tnere wouiu be severax sucn rules lor eacn
wore in fc,n?1ish). insteau, tne syntactic wora classes
usually serve as tne terminal vocabulary ana tne
correspenaence between syntactic woru classes ana tne woras nneraselves is taKen care ol oy a uictionary.
other uramraar i-iouels
Al wnat 1
There a
grammar way t n a
grammar Dan ue
1orraali
«neneve gene rat
generat a n u If
generat be s t r
n i e r a r c grammar
^ tj o I» s K y
because
various
1 ol s ca
re in s ae
t tne
, t n chara s :ri i
r tw e tn
iveiy
one
ea by ü n g e r
ny ol r.i o u e
hier 1
tnin
the abo 1 lea c lact n
pending y are ere is cterize s saia o i o r K e same eq uiva
1 ormr1
anotne in g
succes 13, K n c arcuy. want t g s w n i c
ve pr 0 n t e x
any a on
appii a co
a oy to
al ism
cla
lent 1 ism
i l or
e n e r a s i v e 1 wn am
1 wo
o co.;. n tne
e 3 e n t
t t r
ill er
tue t
eu .
rresp gramra
gene s, e
3 S 0 !
or eq gene
mails
t ive
y m o ong 1
u 1 o 1 e uac aii t
a 11
ee
e n t y pe
t'o
ona ars
rat i tn
1 a
ui v
rat
m , po
re orw
i Ke
K o ere
or» ua p h r a
type
s ol
r ea ing c ol t
e tu er g
nguag
a 1 e n t es a tuen
wer .
powe al la
to
ccas i n t m o
s bee se s
s oi
rules en d
lass hat t
is c
r a m m a es , t
in sup
tnat
i ne
nul nguag
i n t r o n a x i acis
n a a
truct
pnra
p e r r.1
i I l e r
ol la Ype ;
lass
rs o ney a g e n e r
erset !.i o d e J.
rt is
pnra e t n e ü u u c e
y anü can d
escri p ure gr
se st
itteo en t t
nguagft tne
o 1 Ian
r au
re sai
at ive or tn 13 S a wel
se st orists
tries rerer
o .
tion or ammars. rueture ana tne ype ol s wnicn grammar guages. tomata , a to be power,
e class aiu to 1-Known rueture as the
e nere to tne
as ty Tree grarnrn s i G e s
s y m b o
oi te
also tnan
power
rewri
ihe g
pe 0, gramm ar a
oil 1 an
r hi i n a
Know tne c
to
te ru
raramar
type
ar wni nu is t s r e w
U tne i a no
n as
ontext
i init
les wh
m o u e x s
1, type en we n
c r i a r a c t rite ru
right- nonterm
11 n 11 e Tree g
e state
ose lei
in
a, ave erii.
les
nan a in Li.
st
raram
mac
t-na
t n e
ana just
ea b cons
sia sym
ate
ars o i n e
nu s
c noin type
ue y tn ist es a> bOlS gram
anu
s .
ices
SKy niera 3 gramma
acribea a lact tn of a sin
ay oe any i n e t y
mars , are correspon
i n e y are
are sing
rcny are known
rs. i ne context is tne type ^
at trie ielt-nanu
gie nonterrainai
nonempty string pe 3 grammars,
more restricted
u in generative
cnaracterized by
1 e nonterminals
I«
5-
4*1
* H
a«
1
noH hepoit No . iOb'i colt beraneK ana Newman Inc
_^ ana wnose rignt-liand sides are eitner a single terwinal syrauol or a terminal symbol followea oy a single
■• nonterminal.
At the other end 01' tne spectrum are tue type 0 §,, grammars, also known as general rewriting systems, wnicn
correspona in generative power to Turing macnines , General f"| rewriting systems are characterized by rewrite rules wnose
left-hand ana right-hana siaes can be arbitrary strings of terminal and nonterminal symbols suoject only to tne constraint tnat terminal symbols cannot be rewritten as some dif'lerent terminal or nonterminal symool . iype 1 grammars,
M also known as context sensitive grammars, are strictly less powerful than general rewriting systems anu strictly more powerful than context free grammars. ihey are characterized
J, by rewrite rules in wnicn tne lert-nana slues specii'y not only a nontermina1 symbol to be rewritten, but also a context of terminal anu nonterminal symuols wnicn must be present in order for tne rule to be appliea.
r'igure 2 gives a summary of tne types of rules for eacn ">• class of grammars.
In the figure, the notation V is usea to represent the union of tne terminal anu nonterminal vocaoularies of tne grammar (Vt a no Vn), ana tne " operator is useo to inuicate tne set of all possible strinss wnicn can be n—^ae from a given vocabulary (i.e. Vt* inaicates tne .'et of all possible terminal strings). i'he symbol e represents tne empty string (i.e. tne string witn no S y m £> 0 i s ) .
bfaN heport no. jü67 b o i t b e r a n e k and w e w 01 a n J n c
TYPE 0: GENERAL REWRITING SYSTEM
a a. /3 € V
TYPE 1 : CONTEXT SENSITIVE
x— X/a — ß X€ VN
TYPE 2 : CONTEXT FREE
X—' y X€ VN,/€ V*-{e}
TYPE 3 - FINITE STATE
a Y X, Y € VN
a a € Vr
f'igure d : oum.'nary ol tne Cnotasky ttierarcny of ]Jnrase jtructure Ui aumars
repr alle gram n u iii 0
numb sens Lne eras ( i .e long syst inte
tacn e s e n t s ndant mars er rep ers , i t ive lorme
ing an tn
as iti ems, rmeaia
ol a
ease w i L n r e s e n
i fie
g r a m m r is y t nin e rig e lef tnis te "
tne restr in pa a i
t s a prin
ar an pro
g fro nt-na t - h a n is
scrat
grammar lotion r s i n g ü ower n special cipai a tne g nioi tea rn tne na siae d sides not t h un wor
ai en
wo s
in n g r e c 0 oer . ase 1 l er trai oy t rKin ol r
f
cas ca
tne enerat gnit io
cac o i tne ence r e w r i
he nat g str ules a or tn e , anu n be
Unoms ive po n ) o v c -^ n class ciasse between ting s y ure 01 ing as re aiwa e gene aroitr erased
Ky wer tne wit
s w tn
stem its
it ys a ral ary
ou
n 1 e r (wit powe
n a n i tn e co
is rules
pro t lea rewr
amour; t o
arcny n an r ol' i g n e r lower n text t n a t from
ceeds st as iting ts of I a
i ft*
bbN Meport wo. 4 Üb? bolt beranek ana Newman Inc
ti
ft«
tu. resulting string derivation without leaving a trace in , , ., ...,,, that is generated. This is what gives the general rewriting system its power, and also has tne undesirable consequence tnat a recognition or parsing algorithm cannot oe guaranteed to exist tor general rewriting systems. ror ail ot the other classes ol grammars, it is possible to construct a recognizer wnich lor an arbitrary string wil., say yes-or-no whetner that string is in a given grammar. ueneral rewriting systems are theretore not very aesirablt as machine models ot language due to this inability to guarantee a recognition aigoritnm.
uerivations
I r
i i
i i
ror eacn of tne type 1, ^, and 3 grammars, formal parsing algorithms can be ueviseo wnicn, given a grammar ana a string, can answer tne question whether tne string is a sentence with respect to tne ammar. Xnis is aone by attempting to discover a derivati-u of tne string from tne initial syraool of tue grammar by means ot tne rewrite rules. A derivation is essentially a sequence of worKing strings starting with the initial symbol, eacn ol wnich results from tne preceding one by one application of a rewrite rule. a string is said to be generated by tne grammar it tnere is a derivation of tne grammar leading to it.
figure i gives figure 1 .
a jam pie derivation ot the sentence in
bbw Heport NO. 3067 tsolt beranek anu i^ewman Ire
SUMMARY OF DERIVATION
S -^ DET N V DET N
INTERMEDIATE STRINGS
S
NP VP
DET N VP
DET N NP
DET N DET N
r'igure j: h oarapie uerivation
notice however tnat Lncre can ue several aistinct uerivations lor a single pnrase structure tree corresponding to aitierert orders ot applying tne rewrite rules, r'or example, it" instead ci expancixng tne subject noun pnrase o&tore the vero pnrase one were to expanu tne vero pnrase first, one ol tne derivations ot rigure 4 woula result, (r'igure 4 compactly represents ail oi tne possioie derivations of tnis particular surlace string, with tne coffiiüon initial parts oi üiilerent derivations comoineu. Alternative cnoices lor expanding a given string are indicated by tne arrows, anu individual derivations are terminated oy undei'l ining . ;
F I r E I I I I I I r i i r
i i i
OüI\ heport No. 305? bolt üerauek and Newman Inc.
DET N V DET N
NP VP
DET N VP
DET N V
DET N V
NP
DET N
NP V NP
DET N NP
DET N V DET N
NP V DET N *-
DET N DET N
rigure 4 Hiternative uenvations ol tne Ssntence iroia rigure 3
essentially a1 or tne expansion tnat appears in tne phrase structure tree ;ou1d oe done in any oraer ana eacn different oraering woula iv i ailterent uerivation wnicn corresponds to eflectively e same parse. If we don't want to be swamped witn alternative derivations ol the same parse, tnen we neea to include in our parsing aigoritnm some control strategy tnat will Keep it 1rom getting ail ol tnera. Tne typical control strategy tnat is usea in text-oasea parsers (as opposed to speecn) is to decide aruitrarily that tne
bbN Keport wo. 306? bolt beranek ana wewman inc
omy derivations wnich will be consiaereü will be tnose which expand at each step the iet'traost nonterminal in tne string. Ihis etTectively selects one canonical derivation tor each possible parse tree. This raaKes tne derivation shown in rigure 3 the canonical one, and the otner two tnat are shown in rigure 4 are not found.
The Hoots of wonoeterminism
very ons wa sugges scanni mat cne collap strate tne gr simple expres a sin factor the s we Mou
ne c simp r.ts t t
ng s se
gy ai„ma
sion gle or ymbo Id 1
ontr le to u he long ne hat ill r of a mm a U)
term just Is ine
oi st to st se it folio the
right into not w f'igu
r fo can
Li a si
a, o, to ge
rate ate for
wing stri -nan a
ork re 5 r a be a kewi ngle or
c as
gy which w in terms o an analys analysis
ng, as soo a side o single co in general
This f ritnmetioa term (T)
se a term factor, a
c. rigure a parsing
e have t a ge is alg
stra n as y f som nstitu , as w igu.'e 1 exp plus a can be nd fac 6 sno ot th
jus n e r a t o r i 11» tegy : ou f i e ru ent. e can illu
ressi t erm a fa
tors ws tn e str
t u ive ra , i
as no a le,
ho ,.11
stra ons , or
Qf ' V
can e st ing
escr rule t s
yo pie
then weve ustr tes
I can U) be
ruct "a + b
ibeo is , but if eems to u start ce that you can
r, this ate with a very
n it, an be just times a any of
urc tnat
T + T
♦ T
F * F
♦ F
A.B. C
rigure 5: A bimpie Grammar for Arithme \c bxpressions
10
5
ODH neport No . 3007 ■jolt ueraneK anu Wewman Inc
L
A + B * C
figure ö: A Parse Tree for ' a + b • c "
rv f«
i.
Ine way on us Ln sum. ( parentne interpre we üOOK oi' figur wnerever to a X, oy itse ana then tnat we that we reauce wnen we
tnat e pri A si ses tatio t n i s e 5 a
we tuen If. we c
wouio would h • 1 come
we nav ori ty i g ti 11 y to
n ir t string na st couia
w e ' J n we c
ouiu r reuuc De st to an
to tne
e writ tnat t
more e n a D1 e nat ua oi en
arted ne
ave to ouiu r educe e tne uck De ytning impas
ten t n e ne prod
expan one
s wnat aracter aoing
could r go on
euuee t tne i + c to an cause t
ine se is s
rul uct aed
to was s an rea
eauc to t ne o
i t r a
nere stru hown
es ol t comes I gramma
expr intenae u tne c uctions e tne a ne + w n to an
o a s na tnen is no
cture t in rig
ne gramma irst ana r woula ess tne a . ) wow ontext ir
on tile to an V
icn can't r ana tne ingle fei. to a 1 a
rule wni nat we ha ure 7 .
r forces then tne include
other suppose
e e rules string
ana then reduce
n to a i M f t e r
nd alter en will ve üuilt
I
A + B
rigure 7: M ciockea neverse uerivation irom "a+b*c"
11
DöN heport NO. 3067 bolt beranek and Newman Ine
essentially, in order to obtain tne parse tree in Figure 6, it is necessary not to go aneaa and reduce tne secono f to a I. Instead we must postpone tnat until reuueing tne c to an F ana reducing tne t' • f' to a single e", wnich can tnen be reduced to a T and tne 1 + 1 reducrd to a single t.
hondeterministic Algoritnras
especi proces altern aevice oi an ay tni unpred t n e r e severa tnacnin oi al algori device searcn m a g i c a explic cyclin a none comput
nere ally sing, at i ve for
onaet s, we ictao is a 1 cno e by terna turn.
to algo
iiy itly g tnr e t e rm ation
are in wn po
devi ermi ao
ie, prim ices syst t i ve Ine ena
ritn mak Keep ougn inis pat
many art
ere s ssible sing a nistic not re but r
itive i ni
ematic cnoi
n o n a e ble t m to t ing t ing tr t n e m ,
tic a ns lea
applica 1 f 1 c i a I ystematic
enoioe 1 g o r i t n m s algcritn
ler to an atner to cnoice op s algorit ally cons ces oi terminist be write n i n K o t n e r i g n t ack ot
une say Igoritnm d to a su
t ion j inte sea
is for
.n or algo
an ao erat i hm is iaeri tne ic m r oi tue
cno tne s tna if
ccess
in coiap lligence rcn in required. sucn tasks nonüetermi ritn oi wnos stract alg on wnicn c tnen sirau
ng all pos abstract n acnine is
a gramma mac nine a ices, ire alternativ t a string any ot t lul analys
uter science, ana language a space of A conceptual
is tne notion nistic mac nine. e oenavior is oritnro in wnicn an make one ol latea on a real siDie sequences ondeterministic
a conceptual r or otner sucn s if it were eing nim from e cnoices and is accepted by
ne alternative is.
ine first fundamental idea tnat I would like you to remember is tnis notion of a nondeterministic algoritnm as a device lor coping witn tnis type ol searcn in a space of alternative possibilities.
12
bbW heport No. 3067 bolt beranek ana Newman Inc.
backtracking vs Parallel üearcn
There are nondeterministi its effect is t choice, it save of tne informat choice so tna and try anotner deterministic com iguration s undoes tne la alternative. I undoes tne next choice sequence an efficient ge simulators for t'igure 7 , the last reduction parser then w tne reduction o tne point wner not been reduce reduce tne c t on the blocked puts us on the
two principal ways c programs -- one hat whenever tne p s somewhere (usual ion tnat is about t the simulator ca choice. The pro parser until
uch as tne one in st cnoice made f there is no othe to last cnoice, a
s have been consid neral technique fo nondeterministic a
result of bacKtr of r to T. tinoin oula unao tne red f i ♦ T to t, and e tne b nad been r a to I. Tne pars o an t" (a second t patn) and tnen red rignt patn for tne
ol writing simu is called backt
rogram is about ly on a pusndown to be destroy
n come back late gram then pars it encounters p'igure 7, at whi and tries the ne r alternate cnoi nd so on until a ered. Floyd ( 1 r implementing b Igoritnms. In t acking would be g notning else t uction of c to r' eventually would educed to f, out er could tuen irae -- tnis was uce tne r " r to correct analysi
Pi es a
en xt ce ,
lators racKing to mak stack)
ed by unao lik bio
poin poss tne
11 poss 967) g acktrac n e case to undo o do, , tnen back u tnat r go on
uone be
i, w s .
for and
e a all the it,
e a ckec t it ible n it ible ives king
ot the tne
undo p to had to
i ore nicn
im system algori way i is cal it pr anotna of ot when i most r at tna on tn choice search
D atic thm , n wh led ocee r tn her t en ecen t "o e s seq wou
acktr ally savi
icn i " d e p t ds t at de untr
count tly m epth" tack uence Id co
acki wor
ng e t wa n fi o ma peno led ers ade bef
of a s we rres
r.g king noug Iks rst" ke a s on alte a bl cnoi ore Iter re 1 pond
algori on on
n to un tnrough
Tnat cnoice tnat,
rnative ockea c ce, ana backing natives aid out to a 1
t nm e patn ao it tne s is , a tnat
ana so s at a onfigu it tr up to
. If as a
eft-fi
aoes of
late pace iter aepe on ,
if le rati ies tne
tne tree rst
i
tne r . of mak
nas bui
rent on d all nex
spac , t tree
LS nona Tne
poss ing on t loin "ae
oes poss t pr e of ne wal
s e a r c n etermini
system ible cno one cno nat one , g up a s p t n s " . it undo ible cno evious 1 alterna backtrac k.
by stic atic ices ice, and
tacK only the
ices evel tive king
Another way of call independent time tnat you are a object for eacn corresponas to a st nonaeterministic m real raacnine, a con the program count simulation ot a non configurations ins
wnat goes on in
nanaling nonaeterminism is by wnat I 11 alternatives. In such a program, every oout to make a cnoice, you create an of tne possiole cno s. inis object ate or configuration o. tne hypotneticai acnine wnicn you art. simulating. In a figuration is oasically tne contents of er and the register contents; in the deterministic macnine tnere are many sucn tead of just one, linis is similar to
a time sharing system.) ror a
13
Dbh neport NO, 30b7 bolt beranfcK and wewraan Inc.
nona basi inpu syst tnat conl now "bre (wor WOTK alte stat conl com just a io vary
eter call t s em f
yo igur 1'ree aa t n king ing rnat e, 1 igur igur
cr t o ing
minist y t n e tring or han u com at ions to wo first on th
on oth i ves , OOK at at ions at ion eatea ) f t n e priori
ic
stat tna
u i in e t as
rk o ") o e on ers you wtie wni
(whi J
se ties
finite sta e tfiat you t you na g inaepen o a cnoi there are n those c o r you can es that se ior exarapl can pick u re it is i ch it coul en may or ust ÜKe a conligurat lor servi
t e mac are i
ve go dent ce po altern n f 1 g ur jump a em mos e ) . w p a co n t n e d get may no t i m e -
ions ce .
nine n an tten alte int , ativ atio roun t li it n nf ig inpu to , t oe snar In
, the c u tne to. 1
rnat ive you
e cnoic ns all d from kely ol multip
ur^ tion t , com ana the one ol
ing s y s pseudo
oni i poi
n pr s, maice es, in p one sue
le , ae pute n go tne
tern par
gura nt ogra ever up
and aral to cess inae term
th to on
you alle
tion is in tne mming a y time as many you are lei (or another •tef ore
penoent ine its e next another es you can run 1 w i t n
unae nona alte alte to toil para IOOK
appr into wner cons anea exna a g it f for of a gett m a a e " was sear make
i n e r e rstandi ecermin rnative rnative deciae ow, it
iiel, s bette o a c n , oarren
e tne iuer on a to ust ivei iven pa u r t n e r . tne ex
n u n i m a ing oac
In m ted" or chea oe
an ait
i
ng 1st s s , wn
13
or r a one te
nex e c com
y a tn
L
am p frin K t ore un
i or ern
s a ana
ic rat ne xi yo ich possi to ju t an
iias rri to t oes I tne piete n u t n it is ven i ie in at ive o wne c o m p inter e one a t i ve
t ais
prog r t u ar oft Die mp 1
y g to
ry o t en alt
iy en b not
n t n
fig na t
re t 1 ica est i can cno
reme 0 i
rams nan e in ne a 1 or rom iven syt.
el or oice erna sear acK pos
e si ure ure ne r tea ng p get
ice
n a o u s n tex
in backt a pos
11 e r n a you to ont; to
Ji o ra e
teraat i e ne c is .
•lives cu tn up out sidle mpie i 7, tne that ight a e x a m p
arts o oacK
can oe
amva t par
term rackin 11 i o n t ive c lol io anot n
nt . cai iy an wai i ne on to a c e spac of it
to com 1lustr re wer naa t 1 terna ies, 1 tne to th astro
ntage sing s o
g. w n e r e no ice w sev er QI
in t wai K K bac xy wa n o i c e e on
un e bac a t i o n e two o be t i ve tne space e co nomic
for
Nit it
s i era pen tie uow / t y t is
tne ce k t Oi
or u
enc amo tn
rre ai .
lor s i m p 1 e m e i n d e p e
h i n d e p e is aill
s tne oe 1 parsin u ing en oacktra
n a long o tne o go oac to plo current
one has o it ana bac Ktra t n r e e t
n a o n e D
ice naa u n t c 1 at nave c t p i a c
peecn nt ing n a e n t ndent icult st to gs in w n i c n eking pat n
place K ana w on path left pusn
CKing n i n g s e l ore to be s u c n
to oe e to
1 will make a piten tuen for wnicn you snoula Know aoout between systematic bacKtracKing multiple inaepenaent alternatives
i secona tunaamental iuea namely tnis difference
ana the following 0'
Ik
u bb« Report No. 3Ü67 bolt berancK and wewman Inc
öottora Up, lop uown, Hreaictive, and wonpreuictive Parsing
The ü e r i v a t i rules of "bottom- currt nt tne rig tnat mat side or lor simp until w .symbol . a reduct in the s mentione possible dil" leren have be consider process oetaiiS the alg consider once aid reaone wnich is
aigor on ol" the g
up" . workin ht-han c h i n g the ru 1 icity e fin (At 1
ion of tateme a the rules
t pos en ap at, ion as a need
ori tnra at ions lor a
separa ».•ritt
itnm tnat we a given
rammar is an That is, we
g string unt a side ol portion by r le . (I'm as .) we apply ally reduce east the goa tne string
nt we nave j systematic
that could itions in th plied. It ot detail t nondetermini to be cons function can be ma
11 lor a par tely fcr ea en.
described aoove string by reversing algorithm that is look into the inpu
il we find sometnin some grammar rule, epiacing it witn suming a context fr tnis process over a
tne entire stri 1 we are trying to into a single symoo ust made, we nave n
consideration of nave applied at eac e working string wn is exactly tris
hat is acnieved by stic algoritnm. ioered eventually i on a real macnin oe separately ana t sing system anci no ch grammar or versi
for finding a the generative
referred to as t string or tne g that matches ano tnen reduce the left-hand
te grammar here nd over again ng to a single acnieve is auch 1.) notice that ot specilicaliy
each of the h step and the ere rules could freedom from
tninKing of tne uf course the n order to make e, but ney can c have on ol a
these be done to be
grammar
There is another Kino ol parsing aigoriti.m extreme whicn is called "top-down". it ge oecause it starts by expanding the grammar rules top" ano only looks for comparison at tne input a terminal symooi appears in tne expansion, version of a top-uown parser makes use ol a pu into which •ne initial symbol of tne grammar before pa ing begins. bubsequently tne aigori as follows: If tne topmost symbol on tne nonterminal, tnen some rule of tne grammar nonterminal as its leit-nanu side is select nondeterministic choice) and the tupmost sv pusntown stack is replaced witn symbols from tne siae of tne rule (so tne leftmost symbol of tn side is now tne topmost symool of tne stack topmost symbol of tne stack is a terminal symbol compared witn tne next unused symool of tne in if they are tne same tnen tne topmost symooi of removed ano tne string is advanced. If tney oo then this configuration is oiocKed -- i.e. this nondeterministic searcn is terminated. The accepted if tne pusndown stacK becomes empty time that tne last symbol of tne input st.rin (note again our use ol tne nonoeterministic
simplify tne explanation. In an actual parsing
at tne other ts tnis name
"from the string wnen
A simple sndown store
is placeo trim proceeds stack is a
witn that ed (anotner r, r ~ ■ '■ tne
rignt-hano e rignt-nano ). If the , then it is put string. the stack is
not. match path of the string is at tne same
g is used . algoritnm to
algorithm,
15
bbw heport wo . 3Üb7 doit beranek anu Newman Inc.
ail possible cnoices ot expanding tne Lopmoct nonterminal of tne stacK are pursued and tne string is accepted il any ol the alternative computation pattis leads to tne accepting criterion,) An example ol a top-down analysis using a pusndown store is shown in "igure d. (iiere the rectangular enclosure represents tne pusndown store, the arrows the steps in tne analysis, ana t.'je plus sign indicates tne consumption of a symooi from the input string by a given stack configuration,)
NP DET VP N
VP
+ THE N
VP + MAN
V NP
+ BIT DET N
+ THE
w + DOG u ACCEPT SENTENCE
r'igure ö: A Sample Top-aown Preuictive Analysis using a Pusndown Store
Tne narvard predictive Analyzer
Tne original narvard rTydictive analyzer (Kuno ana uettinger, 1903) does a siigntly more optimized version ol tne top-uown tecnnique just aescrioea, it worKS witn a grammar wnicn has been transl ormea so ti»at all 01 its rules nave a terminal symooi as tne first symbol ol their ri^nt-nand sides, inus at every step ol tne pusndown store analysis tne aigoritnm consumes a symooi irora tne input string, anu tne numdcr ol steps in a given computation patn ol tne nonueterrainistic macnine is at most n, wnere n is tne
iengtn of tne input string, (uf course tne number ol steps
16
DtsN heport NO, 3067 bolt beraneic anü Newman Inc.
I or t algo poss ad va by t or in 11 stac new aa va (Ore gram the
ne re rlthra Ible ntage he pr stana nlte K ex insta nclng loach mar possl
al is
alt or
edlc ard loop pand nee
tn
. 19 into bill
compu muc
ernat the s t Ive rorm
s due ing 1 or tn e in 67) w
a ty or
ter h gr ive pecia a n a 1 y ) is to t
nto a e sara put hicn stand such
in sirau eater s coraputat 1 form o zer (kno that it he symbo string
e symbol string. converts ard rorm "lert-r
lating ince it ion pa r tne c wn as u e 1 i m i n a 1 on which e on top An alg an ar gramma
ecursio
tne has
tns , onte reib tes top vent or
orit bitr r n n".
no to
) A xt r ach tne of
uall tne hm d ary nds
nue t roll n ree nor
poss tne
y re stac ue t con
and
ermi ow o addi rule mal ibi
pi suit k w o Gr text el im
nistic ut all tional s used lorm , ty or ndown
s in a itnout eibach
free inates
Predictive vs wonpredictive parsing
t I r
i i i i i
i n e r e parsing 1 and botto Gritilths varieties been a n lit into e the class becoming v importan . top-aown b presented nonpreoict only IOOK a sort tha parstr wi the consti such a c symools on example , algori tnm the anaiy types ot p the curre operates, tontr, st algor j. tnm . togetner would be t tne symbo comoine .
nas oeen iterature m-up alg and Pet
ot eacn t umber of itiier ol leal dist ery luzzy --a als ottom-up --is t ive parsi at a give t it expe 11 find a tuents wn onstituen either s
an innere whicn 1 p sis ther nrases wn nt point only t h this wit
T n e r e , to form ried rega Is to t
a ab
orit rick ype. par
tnes inct
T tine dist ne ng, n po cts giv
icn u 1 ide nt 1 rese e ex ion
in ose n t ir t sora
rule ne 1
great d out tne hms. An
(1965; however
sing algo e broad c ion betwe ne uistin t i o n w n i inction r distincti A prenic
int in th to see th en constr maKe it u s compat of it 1 eature or nteu aoov is ts on t are expec
tne inp constitu
n e s 11 u a ne term in e consti ss or wne eft witn
eal difr
ex whi
, in rith ateg en t ctio en or t on tive e in ere, ucti
P. i ible n t tne
e is ne s ted ut s ents t ion al s tuen tner wnic
or di erences ample ch char recent
ms deve ories, op-down n wnicn is cor he two between parser
put str wnerea
on only rrespec witn a
he inp top-do tnat a
tack a to occu tring ,
will in tne
ymbols t, tnen tnere
n this
scussi betwe
is a acteri years
lopeo and 1 and o 1 tm
relate simple
pred is on
ing fo s a no as a
tive n anal ut st wn pus t eacn predic r to t As th oe 1 simpl
coulo tnat
is an consti
on in the en top-down paper by
res several there nave
which don't tnink that otton up is nk is more d witn the algorithms ictive and e tnat will r things of npredictive runction or or wnether ysis ot the ring. For ndown store
point in tion or the ne right of e algorithm ooked lor, e bottom-up be grouped alternative analysis ol tuent coulo
Ine predictive parsing teennique nas an advantage lor most parsing applications since it considerably reduces the numoer ol applications ol rules that nave to be considered
17
-
büN heport i»o . 3067 boit beranek anu Newman Inc.
ana t h
( i.e. in some
compiet preaict
grammar t n e be
s e n t e n (
1 o u n d
noun p n
tnere
that po attempt
does tn also r correct
e nu s u q u e otlie
e ana ive ol' r'
ginni es ca tne s
rase
is n int. eu ev
is rc e s J 11
pars
m b e r
ncea
r co 1 ys i anai
igur
ng o n De
ub je at t
0 g in
ery w suit s i
ings
ol
ol
n tex s or
ysis
e 1 , i tn gin
et n
no p ramm
tn
nere in
n m
" a c c i
w c r u s
t but
tne ol
tne e sen
witn oun p
lace
ar ru e Do
sine
more ore
d e n t a
tna t
whic
curre "ti.e
parse
tence noun
hrase
tnat
ie wn 11 o m -
e tne rules spur i
i" e
cou
h ar nt s
ma
r lo bec
pnra , it
st
icn up
re i tna
ous
onst i
io ma e not
t r i n g
n bi
OKS f
ause
ses . uoes
arts would appro
s no t tiav
m a t c n
tuen
ke u a c
) . t tn
or a
the now
n't wit
use acu ,
preu
e to es t
ts en
P a ons t i
r or e 0 uog
noun g r a m in
ever , try t
n "D
a no ai 1
i c t i o be t
na t d
'it are
const t u e n t
x a m p 1 e
" , u s i
pbra ar say
once
ü lOOK
it" D u n p n r
rule n. ho riea ,
on ' t 1
t ound
ituent ol any
, in a ng tne
se at
a tnat it tias
lor a ecause
ase at s are t only
out it e a a to
t n e r e
Decau
t n e r e
reduc
tnere
wort'
espec
aue t
utter
all o
i I it cons i tne
down chanc stand
ol' a
i n f o r .Ti i s s i
given
t or i s
se i
i J es i
is
at a
ia i i
0 pli
ance
1 yc inn
s ten
righ tne e o
s a
wr mat!
ng reg
parsing
a great t iollo
a pröD ts au"a
a 1 ^ i
n y give
y true ono L ogi s . 1 l u r p r e d
ucea yo t witn
t parse string t reco better
otig or
on as a
word m i ion ,
tex
ad v
ws 1 lern
ntag r 1 y
n po
or
cai you
let i
u to tnat
"i doin
ver i
c n a ti
i.-i i
SO
z nt
tint
a n t a g e
ewer o in con e . in
big n p i ri t in
tne r
e rleet r gues ons la
oniy wrong
n e n o n g ever
ng ir ce ol ssing urce
be or
ue loria
to us i 1 i n u a 1
t inuous
c o n t i n rooa D i i
tne st
i r s t a r»
s at tn s ol It, ter wi1
lOOK I'O wora ,
preulet y t n i n g o m sue
i i n d i n g
word . lor pr
what KI
ol ng trie
leys . s p e e c
uous s
11 y t n ring rn
last oeg
1 Irs
oe i t nos
t n e n y
ive pa it c
n e r r o
m o s t it
e a i c t i
no oi
ii
pe-
at
ay
w i n ♦
nl e ÜÜ
r s
an
r3 ol ca on n 0
que
rea
n t und ecn
yo De
ord nin wer
1 ue
tni ma
er s
in n t
a
ra
nee
ict ue
ers u
jr
wr
in
g Ü 1
nee
ngs
y n tna tan
ope
e p
nen 3 is
s o
ive otn
tana naer
gues ong .
tne ana
s «r a Dy
tna ever
t go u 3
ci 1 i
arse
pro to
requ
r w algo er
ing stan
s 1 o
i r.
sen
ene eng, it,
t wi
re es u
a o call
in
vide wnat
ired
eras,
r i t ha nand ,
w n i c n
ding ,
r tne
is is tence
s ol tnen
ana
11 De cover
p ana
etter
y, it spite
tnis
tne
in a
Anotner point tnen that I would like to ma
trauecll Detween preuictive ana nonpredict algoritnms lor speecn unatrstanaing. 1 aori't wa
strong case that one or tne otner is ottter; 1
a teeiing lor wnat tne traueolls are tetwe a J. gor i t nm s . Tne pr ea i c t i v e one will a o a mo searcn, ana if one is conliaent tnat tut tnings is üasing its predictions are rignt, tnen it is
^n tne otn er nand, ir tnere's a nign criance t wrong, tnen tne üisaavantage xz tnat tne preaict
you 1rom linuing enougn oi tne correct parse to
source of inlormation lor error correction.
Ke is tnis
ive parsing nt to make a
want to give
en tne two re selective on wnicn it
prelera Die .
nat tney 're ion may keep
De a uselui
;
1
r i
bbN heport No. 3067
»ell-forjied Substring rabies
bolt beranek and Newman ln<
On ot par top-dow c o m p u t a done on ways o analysi entire may be t a D i e " or a coraputa reuoing anaiysi a table wnere i a const consult const it results
e th sing n, p tion the
f a s to rema tne is a cons tion tne
s, a inc
t oe itue s t uen t are
ing ai
rerti pa
sep naly spl
inin same raec
titu so
com com
exea gins nt o ne has use
that was found very ea gorithms, especially ctive algorithms is ths are done separa arate paths. t'or exam zing tne beginning it up into two differ g analysis will be don in botn cases, h
hanism for saving the ent on one path of tnat they can be used
putation, «nenever plete constituent is f oy tne type of consti
ahenever tne aigori f a given type at weii-lormea substring alreaay been found,
a without recomputatio
rly wit
that tely Pie, of a ent e tw "wel resu
a on
in ouna tuen tnm a S tao
anu ti,
in the d h the en
when a • duplica
if two sentence compu '-at
ice, even 1-formed Its of th
nondet otner pat the cour i it is r t ana tne is about iven pos le to see if so.
evelo umera Item te wo
pos caus
ions , thou subs
e ana ermin hs wi se o ecord
pos to pr ition if s then
pment Live, ative rk is sible e the
the gh it tring lysis istic tnout f an eu in ition eaict , it uch a
the
Table Urlented Parsing Algorithms
Tne use ol tne well-formeo substring table is sulficiently useful tnat some parsing aigoritnms have been designed exclusively arouna tnat notion. Tneir central purpose is to liil in tnis table witn entries saying there is a constituent of type x from position y to position z in tne input . Tneir acceptance criterion for a string is finding in tne table an entry indicating a constituent
example an algoritnm due to lounger entries '
g in
r or the (1966) fiiu ... BUH
in order of lengtn ol tne resulting constituent (ana lorü,.as grammar rules whose right-nand siaes consist of a
19
bbN Heport No. 3Üb7 boil beranek and Newman Ine
ol' length 2 and 1 will already nave been raacie ana any
questions about trie existence of such constituents can be answered by merely consulting tne table. The constituents
of lengtn 1 are founü by matcning singleton terminal rules against the input string. wtien sucn an algorithm
terminates, if tnere is an entry lor tne initial symbol from the beginning to tne enu of tne input string, tnen the
string is accepted by tne parser, otnerwise it is rejected.
eliminating heaundancy
not ord can at
ell it
sin
rnis ana ide
to
m e n req a p
tna anu
tne
thi
una
res tna
req
sam
in to
er of
rel an
i c i e n
nas a ce on
neara lysis
ntify
the t i o n e
uires art ic t na
11 a
n it IliC t
erstd tr ic t t iao
u ire e par
tne
do 1 i
y o ear cy
di
e 0 or
f
th 1 e
a e t
ula
s
11
wi 1 ner
mil ion st
a a se
au
a 1 i i i
n a 1 ie au
sad
I t
S row
e g ft-
e,rl ne r c
to or
1 b
el o
n K
te ii i
ove
o ve
ot
ng ny r
van van
he ar b
b ar b fir
i er inc
ano
tne
e v re-, to
Th xt
ere r a
typ
ol e in t
an s w po in tage
tage er it
led eing
led
s t c , an
i via nica fou
sub ery
tn
tr is i par
nt s
nd o
e xc
nt er
t
i ic
a
wo
an a
ua 1 na
se u i
at
S i
0 i
ve
01 ess ta tn i n to
or
al nd 1 ou ra )
or. i to
I s oru
f i que
II i
to a i
ng ut i
r a
ai
iv
bl at
r
sp
el t
nd
oa ar,
r. e er
rs ML
cu it
Uli
sy on ga
gon e co e be la
tne ora 1 eecn em en nero
( K
i hi I ae
y ot ps i
1 t in pro
it t
i
re ia
u a m e stem to
in i
trim , 1
in p u t a t
used .
needed 3 t q U e
nary t under
ts ear by Ke men s same ri vat i ner pa n an a 1 one some
cess in o r e c o s im
x s o in nta 1 a s oper tne p r
n dill
t is
ions
Thi nav i
nee, ex t p
s t a n u
iy in ep t
could di sa
on of
r s i n g
na 1 ys
of tn sucn
g 13
ver f
por La e oi
epar t ate a o o i e m
e r e n t
cr 111
tnat s is ng be
Thi a r s 1 n
ing a tne
rife r
be d van t
a pa u e c n
is to e eri
o r a e r r e p e n
r o m t
nt
the ure i na it
ot
v ay s
cai 1
a par so tn en p u s na
g. ti pplic c n a i n est usea
age
rse w
n 1 q u e
be f ticai
ing i dent
ne er lor
se o
r o m t is g
final
n o r a e r t icular
at one t tnere s many
owever,
a 11 o n s ,
may be
of the
to nelp applies
hich we w n 1 c h
o u n d in things
s wrong on it ,
r o r , 1
s p e e c n
r d e r i n g
ne way oing to n g tne
in many cases, it may be important to be able to jump over and find tne object noun pnraso ana tnen t.ie verb pnrase wnen you naven't lounu tre subject yet. ror example, in tnose cases wnere tne suoject wasn't linuabie because of a garbled word, a well unaerstoou verb phrase could be used to predict wnat Kina of subject ougnt to be tnere. nowever, in other ases wnen you nave found tne suoject first on one patn, a computation patn wnicn finds tne vero pnrase and tnen comes back and worKS on trie subject will lind the same parsing over again, ine solution that we have been using in tne bbH system (wooas, 1974) -- trie solution wnicn I tnink nas to be used -- is to put in appropriate cnecks at various cnoice points to ask wnetner tne tning tnat is about to be produced nas been founu already on some otner patn ano avoid
creating a duplicate, hhen tnis is done at tne level ol
20
MHHW
bbN Heport No. 3067 bolt beranek ana Newman inc
noun phrases, embedded clauses, etc., it tenas to block the redundant generation of larger constituents belore the duplication becomes un lanageable. It still carries witn it the cost of trie additional checking, but 1 tnink that this cost is essential in order to cope with the errors that will occur in speech.
Lexical Ambiguity
1:
I've rnentio parsing problem traditional tex amoiguity oi' w sounds. The tnaj parsing is tne for a given wora amoiguity, "Tim three possible adjective^, "11 "like" can eithe Oi' a parser categories as in syntactic categ If you had to pu separately (app you would be aoi woula happen wit ambiguity of d c 1,000,000 difte understanding, t inability to u speech sounds in to run a parse sequence of synt
ned a number oi' things wnicn make the lcr speech understanding more difficult than t parsing. Another difficulty is the oru identification in the input sequence of or source of lexical ambiguity in text possibility of multiple syntactic categories
In a classical example oi sentential e flies like an arrow," the word "tirao" has syntactic categories (noun, verb, or
led" can either be a vero or a noun, and r be a preposition or a verb. If we think receiving a sequence oi tnese kinas of put, there would be jx<>xü=1^ strings of ories that you could gee for this jentsnee. t eacn such sequence through the parser arently some early parsers did exactly that) ng twelve separate parsings. Imagine what h a sentence of say 20 words with an average ategories per wore; you would have over rent possible such sequences. In speech his basic ambiguity is magnified by the narabiguously determine tne segmentation oi' to wo^d sequences. Clearly one doesn't want r on a separate enumeration oi each possible actlc categories.
wora Lattices
A technique that has with lexical amoiguity input symools rather tnan
been very eiiective for dealing has been the use oi a lattice of a single string. A simple example
oi' such a structure is illustrated in figure 9
21
bbN heport No, 3üb7 bolt beraaek ana Newman Inc
E-V 1 N«
-N i—V' An.i—J ADJ<
PREP DET 1 N —1
TIME FLIES LIKE AN ARROW
r'igure 9: A oample word Lattice
cjucn lattice compactly represents all ol tne possible alternative sequences of in;ut symbols with tne common parts Of different sequences factored together so tnat processing on tnem needs to oe uone only once. *itti such an input, grammar rules are matched tne same as beiore, except that as a rule is matcned against the input, particular paths are selected tnrougn tne wore lattice wnich satisly tne match. This technique has a tremendous oenetit in terms of tne amount ot computation reouired for parsing. wnen a particular rule is matcnea at a given point ia tne word lattice, all of tne possible sequences of worus in wnich tne matching sequence occurs are effectively factored togetne • so that the result of tne reduction is effectively performed just once for an entire equivalence class of word sequences, Ihis technique is very attractive for speecn understanding oecause tne possible alternative segmentations of the input signal into words leads to a lattice structure similar to tnat illustrated in r'igure 9 (altncugn of slightly more varied structure), whereas tne structure in Figure 9 is notning more tnan a sequence of alternative syntactic categories, tne structures for word lattices in speech understanding tend to nave mucn more brancning, and the individual brancnes leaving a given point do not all come togetner again at tne same point. nowever, tne same parsing algorithm runs on this more generalized input lattice and saves a tremendous amount of processing by avoiding the multiplication of combinatorial possibilities.
22
I. CON Heport No. 30C7
Bolt Beranek ana Newman ln(
u
n D
■ ■
Chart Parsers
Tne concept or a word lattice for the input symbols and
": ^^L^^rr^^1-"?..1"1* f°- ^*°^ can be combined algorithm.
parsing are closely related, an into a single data structure in a parsin
The structure of tne well-fo "-' lattice,
iS e^tC^L.the s^e as tftat'öf InVlorT^ SUbStrins tabie
tound in any analysis of An example of like an arrow"
constituents tnat can be any path in tne initial
such a lattice for tne sentence is shown in figure 10.
lattice. "Time flies
i
1 I
I I
23
bbN heport i\lo. jüb? bolt beranek and Uewman Inc.
•VP-
•N-
•ADÜ-
-NP-
-VP-
-S-
•NP«
NP-
N-
V'
•VP
•VP-
•S-
PREP
V
VP
pp.
•VP-
■NP-
-DET- H N—
VP«
S-
TIME FLIES LIKE AN ARROW
rigure 10: An example 01 a weii-formeo Substring Lattice or Cnart
tacn striK of tn entri algor walki match and keeps merge parse tiandl trans graram paper and usual botto
labeled no es repres e applicat es in t n ithms (Kay ng tue c ing rules botn prod a great d
d togethe r and COCK e general format iona ars, botl" , 1 will c tneir der
im piemen m-up parsi
rizontai line ents a seamen Ion ol aone e woro latt 's and Locke ' nart and -KO
against tne s uce a very eal of tne co r. The pri e's is Kay 's
rewriting 1 grammars. ilgoritnms ar all all such ivatives "ona t a t i o n ol ng algoritnra
in tne tigure between vertical t aaded to tne cnart as a result rule (or one oi' tne initial ice). botn ol these parsing s) select a particular oraer for uing new segments as a result oi egments already in tne chart, nice recognition algoritnra tnac
mmon parts ot different analyses ncipal dilierente between Kay's generalization of the method to systems anc an approximation to
ror strictly context tree e ettectively tne same. In tnis parsers (botn Cocke's ana Kay's) rt parsers". In particular, tne tne classical nonpredictive
is a cnart parser.
2!*
11 1
I
bbN Heport No. 3067 Bolt beranek and Newman Inc
Parsing versus Kecognition
I I I I I I I I I I I I I
In order to be called a parser, an algorithm must not only calculate whetner a string is accepted or not, as does a recognizer, out it must also keep a reoord ol' the derivation and provide one or more structural analyses of the sentence. In my description of most of the parsing algorithms so tar, 1 nave glossed over this distinction and only the recognition aspects have been discussed. In order to be a parser, an algorithm must Keep track ot and report what constituents were used as pieces of what higher constituents. This can be done conveniently for a chart parser by annotating each of the segments of the chart with a list of the constituents wnich formed it, -- that is, by a list of the segments which were combinea by some rule to produce the annotatea segment. In general, there can be several ways to form a given segiaent from different sequences of constituents so the annotation must provide for several such constituent lists in order to represent all possible analyses.
nonp algo cons anal char such up" very cite '„p.e cnar part same
both redic rithm t itue ysis . t . r acci char com
rnat i di f f e t of icula char
Coc t ive s th nts
iu igure denta t tog pact ve a rent
r ig r par t wit
Ke s alg algorit
e proper that do ch accid
1 1 snows 1 segment ether wit represen
nalyses analyses u r e 11 w i sing of t h all con
orit nms
ty not
enta a c
s ha h it tati of t merg tn c ne i stit
hm and and
of fi form
1 const hart for v e been s consti on of he input ed toget onstitue nput , an uent poi
Kay share nding a pa i c u e n our
remo v t u e n t all with
ner. nt po a ti n t e r s
s w m
rt Ls exam ed . poi of the fig
inte gure inc
are ith any of a clutt p] e i Sucn
nters tne
comm ure 1 rs ao
13 luded
bottom other accide
n y c o m p er up n wnich a " c 1 e provia
poss on part 2 shows ued for snows
-up, such ntal iete the all
aned es a ible s of the one the
25
BBN Report No. 306/ Bolt beranek and Newman Inc
TIME FLIES LIKE AN ARROW
figure 11: h Lhart wiu. Accidental Constituents Removed
TIME FLIES LIKE AN ARROW
r'igure 12: A Chart Showing Constituent Pointers I'or One Parsing
^6
»Jja^-- --1
bbN Heport No. 3067 bolt beranek and Newman Inc
I.
i. TIME FLIES LIKE AN ARROW
r'igure 13: A Chart Showing All Constituent Pointers for Two Parsi
I ■ c
In m the plan each cons s u f f For alte segm poin comp
ore cna
ar g wi
titu ices a pa rnat ents ts act
t ypic rt as rapn) th a ent t
to rser , i ve (wne
and repre
al c in
, bu se
ype nan tne
cons re a tne sent
ases, our e t ins t of ano t die t incl
titue segm cons
ation
one xampl ide a asso
ne po he mo usion nt 1 ent i titue of a
cann e, ( com
ciat siti st g wit
ists s na nt 11 t
ot or in pa puter ed se on wn enera h eac , ea tried b
label he po
aw as rticul , a ta graents ere t 1 case ft segm c n o f y its ) suf ssible
nice ar i Die (in
he tor
ent wnic left fice par
a P t ma of uica segm a r
of a h is and
s to ses.
icture of y not be a positions , ted by the ent ends) ecognizer.
list of a list of right °nd produce a
tarley's Algorithm
There is another parsing algorithm for context free grammars due to Jay Earley (Earley, 1970), which can be thought ol as a preaictive chart parser. This aU'^rithm combines the benefits ot tne systematic, lattice-oriented parsing of the well-formeo substring or chart parser with the advantages of .-reaictive analysis. Although the algoritnra was developv'd in tne context of parsing for computer programming languages, and is presented as sucn by barley, the algorithm has many tneoretical advantages for
27
bbN Heport No. 3067 Bolt Beranek and Newman Inc.
parsi appre under does model Start in a recor the g or wh each so fa into the p folio
ng con elation standing not quit s, or ing from
table es for e rammar t ich migh rule tha r to the columns , rocedure ws :
text of of
e fit rathe the (whi
ach p hat h t pos t wou left one for
fre its cont into
r it begin ch ha osi ti as be sibly Id be of t
for e filll
e gra operati ext Ire either seems
ning of rley ca on in t en part match consis
hat poi ach pos ng out
mmars on i e par the
to fi the
11s a he in ially begin tent nt) . ition
a g
i
s sing top- t eq stri sta
put mat
ning with The in
iven
n g impor
ha down ually
ng, i te ta strin ched at t what tabl
the i col
ener tant rley or t .wel t be ble) g, e up t hat has
e i nput umn
al, f
's a he b 1 in gins in
ach o th poin bee
s o str i + 1
and or Igor otto to b to
whic rule at p t ( n pa rgan ing,
is
an the
ithm m-up oth. fill h it
of oint i.e. rsed ized and a J
1. (transition) haice entries in the column for rules that appear in the preceding column ana whose match can be continued by matching the input symbol associated witn this column.
this cont rule that to the of repl and all give give
2. CO
inue re
whe tne mate the acc5 gain of t n s n pi
( pr lumn a m
memb n th colu h of col the
s o he d ubco ace
edict for
atch er i e mat ran wh any
umn use
combi if fer na t i t will
ion ev
for the ch i ere and in of a nato ent ucr.t be 1
or "pusni ery cons a rule al column i
s complet the const all rules which a stack in
rial bene possible computat
ooked for
ng") t i t ue ready n wni ed , t ituen whic
subco most
fit b stack ion . and
nake nt wn
in ch it he al t was h wan nsti t
pre y not s whi A gi
found
begi ich thi
s ma gori wan
ted uent diet nav
ch c ven onl
nnin coul s c tch thm ted it . mat ive ing ould cons y on
g ent d be olumn was b can
and c This
ch wa algo
to en sit
titue ce . )
nes in used to
(each egun so return
ontinue memory
s begun rithms, umerate above a nt in a
3. (completion or "popping") For each rule wnose match has just been completed in this column, go back to the column where that match was begun and pick up and continue the match of all rules which can use the constituent just formed .
In Parley's statement of the algorithm the progress of a rule match is recorded by a pair of numbers -- the rule number and the number of symbols in the right-hand side of the rule which have already been matched. An entry in the table consists of these two numbers plus the number indicating the coljrun '.n which the rule match was begun. A sentence is accepfed if, when the last column is filled out, it contains ar entry for a rule whose left-hand side is S, whose match hao h* n norapletea, and whose match was begun in column 0. (The al, ithm begins by initializing column 0 to contain all of the i .ies whose lrit-nand sides are S.)
28
t>oN Heport No, 3067 bolt beranek and Newman Inc.
top- with succ pass top- will anal give kind pars algc that This thos anal pred sine evid to b
tarle down p the a
essive ing in down amoun ysis) n poin
of er , tn rithm would is b
e entr ysis iction e if ence, enefit
arsi ssum lye form pred t to nas
t, t bott e pr we w hav
ecau ies of is the
it m err
algor ng aig pticn labora at ion iction many dete
he sub om-up incipa ill ge e been se the which the s a mix predi
ay kee or cor
ithm is oritum be that it i tes its s "down" in
(which cycles of rmined t sequent a structur
1 differe t a subse produced predicti
are not a tring to ed bless ction is p us from rection.
fre caus s go et o ste
inci lef
he s naly e b nee t of by
on t t le
th ing made fin
quentl e of t ing to f rule P 2. dental t recu et of sis wi uildin being those
an ord echniq ast c e lef for on th
ding e
y thou he way build
s to be howeve
ly may rsion rules t li do a g as a
tnat entrie
inary ue has onsiste t. On speech e basis nough o
ght that a se loo
r, leap in o be Imos ny o for
s in char elim nt ce a und of
f an
of it
nten ked once ove
the use
t th tner
fca the
t p j nat with gain erst unre
an
as a starts ce and for by
such r wnat final data e same chart
rley 's chart
arser. ed all
some , this anding liable alysis
Transition Network Grammars
The pre extremely s a grammar fo one finas t a verb alone some with any of tnese any of the If one were free rule, rapid prolif a lot of grammars imm such as th constituents repeatable (usually par vertical st star operatu usually tho ordinary con sucn notati bad way to i in parsing of optionali network gram
sentation so far has been impie sample grammars. mhe r any appreciable subset o hat there are some verb phr , some with a verb plus an a verb, an inairect object tnrce forms witn a preposi
three forms with two prepos to write each of these as as illustrated in Figur
eration 'possibly infinite) stuff in tne right-nand sid ediately find themselves fa at illustrated in Figure , alternative constitue constituents are indicate entheses for optionality, rokes for alternative sequ r (*) for repeatable consti ugnt of just as aborevi text free rules, but the ons into ordinary context mplement tnem. Instead, on if he takes advantage of tn ty, alternatives, and repea mars provide a mechanism fo
illustrated by two n one oegins to write f natural language, ases whicn consist of object noun phrase, and a direct object, tional pnrase addeo, itional ph/ases, etc. a separate context
e Ha, we find a very of rules that share
es. People who write Hing into notations Mb in which optional nt sequences, and d by some notation
;urly brackets or ences, and the Kleene tuents). These are ations for a set of actual expansion of free rules is a very
e buys an advantage ese primitive notions tability. Transition r doing this.
29
BbN Heport No. 3067 Bolt Beranek and Newman Inc.
A basic transition network (bTN) is essentially a finite state transition diagram to which recursion has been added by flat (see woods, 1969, 1970, 1973a). The result is no longer a finite state device, but rather is formally equivalent to a pushdown store automaton or a context free grammar. The bTN is a labeled, directed graph whose nodes, which we call states, represent states which the grammar can be in in tne course of generating (or analyzing) a sentence, and whose arcs represent transitions from state to soate. The labels on the arcs indic?-e the input symbol or type of pnrase which must be consumed from the input string in order to make the transition. It is the possibility of arcs (called PUSH arcs) labeled with the names of phrase constituents that provides the recursion wnicn makes this model more than finite state. The grammar contains a start state for each of the types of constituents which can be called for on a PUSH arc, and distinguisnea states called final states wnich represent the completion of «-.ne analysis of some constituent. A PUSH arc can be taken if some string acceptea by the start state associated with the label of that pusn arc is consumea (or generated). There is a mecnanical proceaure presented in «oods (1969) for transforming any given context free grammar into an equivalent bTN and performing a number of optimizing transformations on tne resulting ÖTN to produce a grammar which is more compact and more efficient for parsing than the original context free grammar. bssentialiy the bTN provides a way to factor a context free grammar into a finite state part and a recursive part so that as much of the grammar as possible can be expressed in the finite state part and optimized by tne same tecnniques applicable to finite state grammars.
The set of notations used by linguists for representing alternative sequences anc repeatable constituents in their grammar rules correspond to the operations callea "union", and "closure" in the theory of finite state automata, which together with the operation of concatenation are known to generate the finite state languciges. Thus, tne right-hand sides of grammar rules using these notations are merely notational variants of what is in automata theory called a "regular expression" , and there exist formal procedures for translating such a representation into an equivalent transition diagram for a finite state macnine. These same procedures can be used to translate a context free grammar using these notations into an equivalent bTrt, such as the one illustrated in Figure 14c.
30
w
D
I I I I I I
bbN Heport No. 3067
VP«
bolt Beranek and Newman Inc
V
V NP
V NP
V PP
V NP
NP
PP
V NP NP PP
a SEPARATE CONTEXT FREE GRAMMAR RULES
VP-^V (NP (NP)) (PP)1
b. MERGED REPRESENTATION
^^^MP.^^^^^^POP r
C. REPRESENTATION^ AS BASIC TRANSITION NETWORK (BTN)
©■
Figure 14: Alternative Representations tor Multiple hight-hand Sides ol ürammar Hules
31
bb.N Heport No. 3067 bolt beranek and Hewinan Inc
Thus the BIN I'ormaiisra provides a realization tor tnese notions of alternative sequences and repeatabie constituents that is more efficient for a parser as well as being less redundant as a linguistic specification. Each of the arcs leaving a given state represents an alternative possible continuation of the string being generated (or of the analysis of a given string).
the cont be on e expr free gram gram part bTN by gram tree A pr is g
The tr mergin
ext fre perform ach in ess ions rules .
mars n mars w icular , grammar Earley ' mar com
gramm esentat iven in
ansit i g of e rule ed on diviau
were rtost
ave na nicn
harl s and s alg pared a r can ion of MOOdS
on n com
s , a ly o al
ex of
tura taKe ey 's the orit to t eas a v ( 19
etwor mon nd th nee o copy pa ride trie p 1 gen
ad v alg
nuraoe hm f he pa iiy b er sio 69) •
k gr part is p n su as
d i arsi eral anta orit r of or rsin e le n of
amraar s of ermi t en pa woul
n to ng al izat i ge o hm i pa-s
a pa g of ss by tari
ef f wh
s pa rts d b sepa go<-i ons r s a ing rsin
an fac
ey 's
ect i at w rsi n inst e ♦■
rate thms to t this natu ope
g of eq
tors alg
vely ould g o ead he ord for
rans ra
ral rati an
ui va oi
orit
pro be
pera o f s case inar con
i t io ergi algo ons opt i lent tour hm f
vides difle t ions epara
if y con text n net ng. r ithm requ
mized con
or f or b
for rent
to tely the
text free work
In for
ired bTN
text ive . T N ' s
Grammars for natural tngiish
each gramm ( ther vario recur On th most incap const sensi recog s t r u c u s e f u gramm possi sucn
In compar other, i
ars have e exist us types sion make e whole, natural
able of ructions tive gram nizer fo tural des 1 power ars and h b1e to ha grammars .
ing the t has b great formal
for fin s it un context grammar dealin and
mars ha r such criptio not a
ave t n e ve a pa
model een fo com put , mec ite st sui tab free
s for g wit disco
v e s u f cons
ns. G Ireaay undes
rsing
s of und t at ion hanic ate le fo gramm na tur h ce nt inu f i c i e t r u c t enera
pre i rabl algor
the hat al al mach r na ars al 1 rtai ous nt f ions 1 re sent e co ithm
Chom wher ad va opt i ines '. ura prov angu n k con
orraa , b writ
in nseq Tor
sky hi eas the ntages miz ing ) , tne 1 langu ide the age but inds o stituen 1 power ut prov ing sys
conte uence t the en
erar fin for
proc ab
age sim are
f c ts . to
ide terns xt nat t irr
chy with ite state
parsing edures of sence of analysis . plest and formally
oordinate Context
provide a no useful
add no sensitive it is not class of
32
bbN Report No. 306V bolt Beranek and Newman Inc.
I I I I
Transformational Grammars
i
have show mode than serv lang tran gram plus orde inse Tran iden feat of tran Flgu pass the
Ther bee
n to 1. U cont
ed a uage sform mar b
a r of rt c sform tity ures the
sform re 1 ive s corre
e ar n p be e ne f ext 3 t gra
atio asic set cons onst atio of
asso sen
atio 5, ente spon
e a num reposed quivale ormalis free gr he veh mmar i nal gr ally co of tran tituent ituents nal rul consti
elated tence • nal rul whicn nee fro ding ac
ter o for
nt to m, ho ammar icle n tn ammar nsist s for ir s an at v
es ca t u c n t with
Per e is produ m the tive
f o na
the weve s ha for
e 1 of
s of atio d i ario n a s a the haps the ces "de
sent
ther tural oroi
r, wi s st
mos ast
Cho a co
nal r n ge us po Iso nd t words
the passi tne
sp s ence.
gra la
nary th c imul t o deca msky ntex ules nera siti test he ana si
ve t "su
true
mmar ngua con
onsi ated f th de .
t fr whi
1 m ons
co pres som
mple rans rfac ture
fo ge w text dera
11 e st
T A tr ee " ch c ove , in t ndit enc«; etim st form e st " t
rmal hich
f re bly ngui udy his ansf base •I., p
de he p ions
of es t exam atio ract hat
isms tnat have been
e grammar more power sties and of natural
is the ormational " grammar ermute the lete, and arse tree.
such as syntactic
he phrases pie of a n shown in ure,: for a underlies
33
faBN Report No. 306? bolt beranek and Newman Inc
PASSIVE
NP (AUX) NP
2 BE + EN-J-3 BY-H
CONDITION: 4 # 1
a. STATEMENT OF THE RULE
K
NP AUX VP
I I A 1 2 V NP
3 4
S,
NP AUX ^P
4 2 BE EN V BY NP
b. EFFECT OF THE RULE ON TREES
Figure 15: A öample TransIormational hule Transformation
The Passive
The rule says that if you can analyze an intermediate phrase structure tree into a sequence consisting of a noun phrase, optionally an auxiliary verb, followeü by a main verb and an object noun phrase, then you can transform the tree by moving the subject noun phrase (1) to the position of the object noun phrase (i») appending the word "by" on its left, moving the object noun phrase to subject position, and appending the morphemes, "be" ana "en", to the left of the
main verb. This rule changes the tree structure
314
I. BbN Heport No, 3067 tiolt beranek ana Newman Inc.
I ■
tm
■ •
■ ■
•«
corre "John to t combi s e n t e gener free serie tree usual appli sente
spond was
he ri ne th nee at ion base
s of by
ly or ed cy nces .
ing to shot by ght of e two i by a of a d gramma interme means aered , c i i c a i J
"Mar har
the nto tran eep r an di-M of mar
y to
y sho
y". next a pas sf orm struc d the e str the ked succ
t Jon (A la verb t par ation ture n tra uctur trans as o essiv
n" i ter and tici al tree nsf o es i form ptio e em
nto tha rule wi a "post pie.) T grammar by mea
rminj, t nto the ational nal or bedded
t corres 11 rau/e cyclic"
he gener consis
ns of th his tree surface rules, obliga
clauses
pond the rul
atio ts e c thr str whi
tory in c
ing to "en"
e will n of a of the ontext ough a ucture ch are , and oraplex
captur and a of Eng model . such a any si this consid and h transf
he tra ing th great d lish na
howev gramma
gnifica grammar erable as pro ornatio
nsformati e major s eal of ou s been di er, it is r and no nt amount
model , effort in bably tn nal gramm
onal ynta r cu scov inc
pars of alt tni
e o ars
gramm ctic fa rrent k ered an redibly ing alg text ha hough s direc nly wo in exis
ar cts a nowle a coa
i n e f orith s eve Stanl tion rking tence
appe bout age ifie f ici m su r be ey for par (Pe
ars natur
about d in t ent to itable en de Petric a numb sing a trick,
capa al 1 the
erms par for
velo k h er o Igor 196
ble of anguage,
syntax of this
se with parsing
ped for as spent f years ithm for 5).
I Augmented Transition Net wo Tics
I I I
In linguis preserv parsing model o network ( 1969, were ma and r'ra network carried pieces actions test a proceed associa string build equal, same transfo economi structu very c
order tic ade ing the
algori f gramm
(ATN) . 1970, 19 de by Th ser ( 196
gramma along w of tree associa
nd set t s with a ted wit into reg larger etc. It kinds o rmationa cal way res , whi ompact
to quac
ef thins ar
Pr 73a) orne 9). r au ith str
tea he c n AT h tn iste stru tur
f s 1 gr . T ch t repr
obt y o f ici , 1 whic esen
h , Br An
gmen the uctu with onte N gr e tr rs, ctur ns o true amma he m he n esen
am f a ency
ha n I tati arli atle ATN ted stat re, the
nts amma ansi use es , ut t tura r an ergi etwo tati
a tr of
ve b ca
ons er a y. a cons with e an ana arc
of t r , tion the ehe
hat 1 d d c ng o rk g on
grammar ansform
the een dev 11 an of this ttempts na Dewa ists of a set
d which with ar s of th nese re the co s can p
conte ck whe ^his mo escript an do f commo rammar
of qui
formal ational various eloping augment nod el a along
r (1960) a bas
of regis can h
bitrary e gramm g i .<» t e r s . nditions u t piece nts of ther two del can ions as it in
n parts provide
te large
ism with grammar context
and refin ed trans ppear in similar and by B
ia trans ters whic ola arbi condition ar which
As a pa and ac
s of the register register
construct those
a much of altern s, permi
the while free
ing a ition woods lines obrow ition n are trary s and
can rsing tions input s to s are
the of a more
ative ts a
grammars, and
35
BBN Report No. 306? Bolt Beranek and Newman In'
this lang (Woo spee of t Engl comb basi unde type cond a s tran diff adva with orde coul Une spee fun^ such unam such pred anal than
mod uage ds, ch he f ish inat s rsta s itio imil siti eren ntag whi
r to d o of t ch tion
wo bigu
wo icti ysis cor
el ha und
Kapla under ew 1 that
orial of nding of ns an ar w on n t ru e of ch on pred
ccur he 1m under word
rds ously rds a on an
sin rect
s se erst n, a stan ingu
ar pro
the pro
cont d ac ay, etwo les) the e ca ict to t port stan s su are fin
re a d a ce ones
rved andi nd N ding isti e a blem syn
ject ext tion but
rks wh
tran n fo the he r ant ding ch a aim
d in Imos re spur
as ng ash- , th call t a s. tact (Ba fre
s as sue (su
ich siti How type ight rol Is
s "a ost the
t al not ious
the ba systems Webber, 1 e transit y adequa 11 amena This mode ic compo tes, 197 e gramma sociated h grammar ch as m we disc
on networ the arcs
s of cons or left
es of a to predic " , "an", always u inputi
ways foun even lo matches
sis such 972, ion te ble 1 is nent »♦, rs with s lo ergi us&e k fo bac
titu of a
sy t th "of" nstr In t d as oked wou
for s as t
woods, network grammar to co being of t
Woods, can be the gr
se the ng com d previ rmalism kwards ents o given
ntactic ose pla
shoul essed a he BBN a resu for
Id be f
everal he LUNAH
1973b) grammar
s for ping wi
used he BBN 1974). augmen
ammar ru benefits mon par ously.
is th and forw r words word or
compon ces wher d occur nd diffi speech
It of sy during ound mor
natural system
For is one
natural th tne as the speech Other
ted by les in of the
ts of Another e ease ards in
which phrase. ent in e Rfflvll
since cult to system
ntactic lexical e often
The ATN formalism suggests a way of viewing a grammar as a map with various landmarks and recognizable locations that one encounters in the course of crossing a sentence from left to right. For speech understanding this perspective is beneficial, for example, in attempting to correlate various prosodic characteristics of sentences with such "geographical landmarks" within the structure of a sentence.
Let me conclude this presentation of syntactic techniques with a reiteration that 1 have not attempted to make a case that any one parsing teennique or grammar formalism is uniformly better tnan others (indeed I do not believe there is a best one for all applications). Rather, I have attempted to give sufficient insight into the relative advantages and disadvantages to enable the reader to make appropriate choices for particular applications.
36
1. bbN Report No. 3067 Bolt Beranek and Newman Inc
In
i.
4.
mm
I 1
I I
I I I I I f
Fart 11. Semantics
Turning now to the subject of semantics, I should perhaps first make the point that the word "semantics'- meano different things to different people. There is a tradition in philosophy and logic that specifies the semantics of formal systems such as the prepositional calculus in terms of a set o- "truth conditions" for each possible expression in the system. These truth conditions are abstract entities which specify the situations or "possible worlds" in which tne statement would be true. In linguistics, on the other hand, concern is usually devoted to finding a notation or representation in which to specify ea of the different possible interpretations or "readings" which a natural language sentence can have and to procedures for determining wnether a sentence is meaningful or "anomalous" (i.e. not rtieaningf ul) , The linguist does not usually follow this up by pruvioing a semantics in terms of truth conditions for his notation. In the field of programming languages in computer science, the semantics of a programming language is specified in terms of the computations which the machine is tc perform as a result of a given expression. In specifying a formal semantics for such systems however, one usually taKes recourse to defining tne semantics by reducing it to another notation such as tnose of elementary arithmetic, wnose semantics is presumably understood. In the fields of computational linguistics and artificial intelligence, the term is perhaps most misused. In some cases, it is taken to cover everything that isn't syntax -- i.e. everything that is not part of a grammar, while in otners it is asserted to be no different in principle from syntax, and any basis for a aistinction between the two is denied.
wnile I don't have tne space here to go into a complete exposition of tne different concerns of all of these different perspectives on semantics, I will try to give a briet synopsis of the aistinctions.
Let us begin by considering what all of these different things which call themselves semantics have in common. According to IT dictionary, semantics is "the scientific study of the relations between signs or symbols and what tney denote or mean." This is the traditional use of the term and i-epresents tne common thread which links the different concerns discussed above. Notice tnat the term does not refer to the things den^-^d or the meanings, but to the relations between these x, .ngs and the linguistic expressions which cenot*» tnem. Thus, although it may be difficult to isolate exactly what part of a system is c=manticsf any s>stem whioh understands sentences and carries out appropriate actions in response to them is somehow completing this connection, and therefore is
applying semantic knowledge to this task. Une of the common
37
BBN Report No. 3067 Bolt Beranek and Newman Inc
ID13U
ling cove ling infe sine sema invo but good "sem proc subs term lite this boun cone that by i
ses o uisti rage uisti rence e for nt ic Ives also name
antic ess. titut inolo ratur
pap dary lusio
not t.)
f th cs a of
o fo ca
man inf
not som for in
I r e
gy e, er
be ns a
al
e te nd a the
rm a pabi y ta orma only e in thi
fere egre term is I w in twee bout ] wr
rm sem ptific
term nd mea lities sks in tion the d
ferenc s furt nces" t to
for so we ill u referr
ay that
iters
antics ial in
not ning,
of lang
to ma etermi e abou her in have say
sue 11 es se th ing t mbol refere who us
in th tellig only but to the sy uage ke an nation t that ferenc come
that h pr tablis e term o inf and
nt. ( e this
e field ence i to thi all of
stem. process
evalu of the object
e proce to be u I have ocesses hed in "seman
erences referen One mus term m
s of s t s re the
This ing, atio
ob . I ss, sed
no an s
tic th
t a t be ean
com o e lati ret mis th
n n Ject n ab term fOi
re
d. ome infe at ud wwa
the
put- xtei.d
on be rieva use a e us ecess
den sence s sue the e ally since
of rence cross then re ho same
ional the
tween 1 and rises e of arily oted, of a
h as ntire good the the
s' in
The coneer the areas of same process, b linguistics and In reducing the some formal n of the job if h the resulting concerns of phi .jcmantics for specifications languages in te are satisfactor what these nota for specifying
ns of t semant
oth of speech semant
otation e does formal
losophe formal of th
rus of y only tions t the sem
he 11 ics whic unde
ics o , the not g syst
rs an sys
e fo the n to th hemse antic
n crn i <5 -- o — — —
are e h th rstan f nat ling
o on em. d lo terns rmal otati e ext Ives s of
t s and ffectiv e fiel ding wi ural la uist ha and spe It is a gicians takes semant
ons of ent tha mean. natural
the ely ds 11 h ngua s on cify t th
in ove
ics elem t we This Ian
nh i r •• —
two of ave ge lye a
is p sp
r . of
enta und is
guag
1oso"her halves o computat to cope sentence mpleted semantic oint tha ecifying Notice progra
ry arith erstand also the e .
the draw
wever thing
s in f the ional with, s to half
s of t the
the that
mming metic fully case
I hope the above presentation has aler of the different kinds of things to which t can refer, and I will attempt to make clear using in the remainder of this presentation out that in the field of computational ling have nearly as good an understanding of s of syntax. I cannot give you the same kind ideas through successively more powe techniques, all of which are well understoo the mechanisms wnich we understand thorou be inadequate for dealing with many aspects and the techniques which hold promise of of the more difficult problems are not understood ur tested, for anyone to say whe solve the problem or not. In this area, th
promising approaches, but few definite answ
ted you to some he term semantics which one I am
I should point uistics we don't emantics as we do of evolution of
rful models and d. Here instead, ghly are known to of the problem,
dealing with some yet sufficiently ther they in fact en, we have many ers .
38
bbN heport No. 306? BoJt Heranek and Newman Inc
..
..
unae repr jfyst
spec- have part appl tech seraa natu othe seraa de ve spec unde most are natu Art i Carb Coll haph Sane art i Abel ( 197
«tia
rsta esen em en),
d icul iea niqu ntic ral r i ntic lope if ic rsta par bei
ral cles onel ins ael ewal cles son, 3).
t I r d i n g tatio that and
i rect ar, 1 in t
e ol s wn langu s tn netw
d by appl
nd ing t , th ng d langu
whi 1 and and
( I960 1 ( by N hun
will of
n and under
then s rel
will he bbN
sema ich I age qu e tec ork r Quilli icatio , see e deta one i age wi ch ma Coili warno
) , Hei 1971 ), eweil, t, Li
a 11 e m some
inLerp stands orae sp evance descri speec
ntic have
estion hnique eprese an (19 ns of
Nasn ils of n the 11 hav y be ns (19 ck (1 dorn (
«in Sim tu o
ndsay ,
pt b
reta na
ecif t
be t n un int ap
-ans of
ntat 68, tne
-web man
area e to of
74) , 97U )
197^ ogra ns ,
an
10 do h a s i c pi ticn that tural Ian i c t e c h r o speec wo techni derstandi erpretati plied ef wering a
"s^mant ions of 19 69). F
latter ber (197 y otner i of corapu be left
interest Collins
. P111 m o ) , Norman d (1972) Milks, wi a becker
ere xnci
wi guag ique h u ques ng s on feet ppli ic kno
or m tec
4 an nter tat i
to incl and rp
and , w nogr
in
13 pies 11 e (w S w nder whi
yste int
ivel cati inte wled ore hniq d 19 esti onal
th ude : Qui
(196 hum
oods au ,
6c
pr of
appl hetn hich stan ch m , o y t ons , rsec ge deta ue 75*) ng t sem
e r bru
Ilia 8) , elha
(1 Scha hank
ovid se
y t er t
I ding are One proc o s
an tion whic ils to . F hing anti ef er ce ( n ( Grp
rt ( 967) nk, and
e an mantic o any ext or think
In being
is the edural everai d the s" in h was on the speech or the s that cs for ences. 1973), 1969), pn and
1973), , and Colby, Colby
1 frocedural Semantics
I I I I
It stand linguis since terms o Notice of mean present procedu wheneve all on r e p i- e s e
app on
ts i they f tn t na
ing exc
re i r s e h ntat
ears f irme
n spec can
e proc t the that ept by tself omeone as wh ion of
that
r g if yi def
edur not
elus mea
is s ca
en it ,
the round ng t ine t es tn ion o i ve ns of o m e t n rr ies it i
pro tna
ne s he se at th f pro qual i alte
ing a out
s no
gramm n th emant man t i e mac cedur ty o rnati bstra the p t be
ing lang e phiios ics ol cs of the hine is e shares f being ve repres ct which rocedure, ing exec
uage ophers their ir not to ca with t impos
entat i is ins but o
uted
theor or
syst ation rry he no sible ons . t a n t i t n e r w is
ists the
ems, s in out. tion
to The
ated ise, some
Although in ordinary natural language not every sentence is overtly dealing with procedures to be executed, it is possible nevertneless to use the notion of proceaur-es as a means of specifying the truth conditions of declarative statements as well as tne intended meaning of questions and commands. One thus picks up the semartic chain from the pnilosopners at tne level of truth conditions and completes it to tne level of formal specifications of procedun 'es .
39
BBN Report No. 3067 Bolt Beranek ana Newman Inc.
These real notion in te "proce and t applic natura notabl semant Hash-* of Wi questi alumin unders pyrami the am block techni rule d that s cf the effect a numb
can mach of
rms dura he atio 1 la e co ics ebbe nogr ons urn tand d o bigu or
ques rive yste «* •■
V- u
ivel er o
in t ines char of
1 se term n o ngua mput
ar r, 1 ad such in
s an n t ity a use
n an m , I 6 c u n y se f ot
urn and
acte mec
mant ha
f t ge u er s e 972, (197 as hi
d ca he by d bloc d in d si wil
1 Q uc
rve her
bs chara can be
rizing t hanical ics" in s since his tec nderstan ystems w the LU moods,
2). Th "what i gh alk rries ou block in etermini k in t the LUN
nee I am 1 use LU . I b as a for language
cteri there he tr proc
my 19 gai
hniqu ding hich NAH 1973b e fo s th ali t ins the
ng wh he c AH sy more NAH a hink mal m unde
zed by a uth edur 68 A ned e i has make sys
} an rmer e a roc
true corn ethe orne stem fam
s th the
odel rsta
by ncho cond es FIPS wid
n c been
us tem d th
un vera ks?" tion er," r th r). are ilia e pr
ru for
ndin
their red to itions is on paper
e cir ompute very
e of (woo
e bloc dersta ge co , whi s such
( incl ere is
Sine more
r with incipa les u what
g syst
oper phys of
e tha (Woo
culat r sy ef fee this
ds, ks wo nds a ncent le t
as uding a py
e th forma the
1 il sed is go ems.
atio ics. sen
t I ds, ion. stem tive
ty Kapl rid nd a rati he "Pu res
rami e se lize deta lust ther ing
ns on This
tences called 1968)
The s for
Two pe of an, & system nswers on of latter t the olving d on a mantic d and ils of ration e can on in
Semantics in LUNAH
The semantic framework of the LUNAh system consists of three parts -- a semantic notation in which to represent the meanings of the sentences, a specification of tne semantics or meanings of tnis notation by means of LISP programs, ' id a procedure for assigning representations in the notation to input sentences. In LDNAh, the semantic notation (which I have referred to there as a query language) consistj of an extended notational variant of the predicate calculus.
The query language contains essentially three kinds constructions :
of
1) designators, which name or denote objects cr classes of objects in the data base,
2) propositions, which correspond to statements that can be either true or false in the data base, and
3) commands, which initiate and carry out actions.
Designators come in two varieties ~- individual specifiers and class specifiers. Individual specifiers correspond to proper nouns and variables. For example, S10046 is a designator for a particular sample, ULIV is a designator for
^0
bbN Heport No. 306? Bolt Beranek and Newman Inc
3
i.
I ,
u
a certain mineral (olivine), and X3 can be a variable denoting any type of object in the data base. Class specifiers are designators used to denote classes of individuals over which quantification can range. They consist of the name of an enumeration function for the class plus arguments. "or example, (SEU TiPECS) is a specification of the class of type C rocks (i.e. breccias) and (DATALINE S10046 OVERALL OLIV) is a specification of the set of lines of a table of chemical analyses which correspond to analyses of sample 310016 for the overall concentration of olivine.
Elementary propositions are formed from designators as arguments, and complex p formed from these by use of tne logical conn and NOT and by quantification, (CONTAIN S10046 OLIV) is a proposition substituting designators as arguments to CONTAIN, and (AND (CONTAIN X3 OLIV) (NOT (CO is a complex proposition corresponding to th X3 contains olivine but does not conta Elementary commands consist of the name function plus arguments, ana liice propos commands can be constructed using logical quantification. TEST is a command tunction truth value of a proposition given as its (TEST (CONTAIN 310046 OLIV)) will answer yes on whether sample S1Ü046 contains oliv PRINTOUT is a command function wnich representation for a designator given as its
predicates with repositions are ectives AND, OR,
For example, formed by
the predicate NTAIN X3 PLAG))) e assertion that in plagioclase.
of a command itions, complex connectives and
for testing the argument. Thus, or no depending
ine. Similarly prints out a argument.
The format for a quantified proposition or command is
(t'UR QUANT X / CLASS PX QX )
where numer quant ob jec a res comma quant TYPEC quant every olivi ; (P the conce been in th
QUA ical if ic ts trie nd b if ie S) if ie
ty ne. R1NT chem ntra slig e LU
NT i q
atio over t ion eing d ex : ( d pr pe (FO
OUT ical t ion htly NAR
s a typ uantifi n, CLAS whicn on the quanti pressio CONTAIN opositi C rock R EVERY X2)) is
analy s. (F simpl i
system ,
e of ers, S is quan ran
fied ns . )
XI on c tha X2 & q
ses or fled but
quanti etc .
a clas tificat ge, and
(bot For ex PLAG)
orrespo t conta / (DATA uantifi of SI
exposit nere c the di
f ier
), s spe ion i QX
n PX ample
i
nding ins p LINE ed co 0046 ory r ompar f fere
(EAC X elf i s to is ana (FO
(CON to
lag! S100 mman for
easo ed t nces
H, EVER is a er for range ,
the pr QX may R EVERY TAIN XI the st oclase 46 OVER d to pr
over ns, the o that are mi
i, S var
• he .X
opos them
XI OLI
atem also ALL into all not
ac tu nor .
ÜME, iable class speci ition selve
/ V)) ent cont
OLIV) ut al
oli ation ally )
THE, of of
fies or
s be (SEQ is a that ains
: T 1 of vine has
used
kl
bbN Heport No. 3067 Bolt beranek and Newman Inc.
Semantics of the Notation
H the m notati relati for e a specif proced the or for ea a pro given specif subrou FOB f the lo f u n c t i ex pres functi compon proced In tn proced iangua LISP and t h retrie
aving s eanings ons. A ng the ch of ying s ure or edicate ch of t cedure the vai iers tine wn unction gicai o ons T sion i ons w h i ent an ures c e LUNA ures i ge is s program e data val com
peci . w s me not the
eman subr for
he f whi
ues for ich
it pera EST n t en h d a apab h s s d o en s . ba '..e pone
fied e m ntio atlo
pr tic out i giv
unct ch of i
th enum self tors
an ne ave re ic yste one osen The on
nt o
our s ust n ned be ns to edicat repre
ne whi en val ions w can c ts arg e FUH erates is al AND,
d Ph query proced theref of exe m, th in L tnat total whic
f the
emanti ow sp f ore , proced e nam sentat ch wil ues fo hich c ompute uments
f unc the m
so def uh and INIÜÜT langu
ural d ore cut ion e de t ISP an its e ity of n the system
c no ecif we ures es ions 1 de r th an b the
F t ion embe ined NUT
age ef in them on mit d tn x pre the
y o
tat io y th do t whic
that , we termi e arg e use valu
or ea , we rs of by a and
Thus is
ition sei ve the d ion e not ssion se fu perat
n fo e m his h ca can wi
ne umen d , w e of ch
wi the sub
the any
a c s in s a ta ci at io s a net i e c
r re eani in
n be be
11 the ts . e wi tha
of 11
el rout basi
we ompo the
well base aii n of re on d onst
pres ngs LUN exe us
spec tru Sim
11 s t fu the requ ass . ine c o 11 si ti ret
d 1
oi the
exec ef in itut
enting of our Ah by cuted, ed in i f y a th of ilarly p e c i f y netion class
ire a The
as are ommand formed on of rieval efined n fact tnese query
utable it ions e the
d e f i n funct as a of tn possi corre in ten by me sampl is by " sarap cheek other inten witho appl i such each are c ( e x t fJ aigor 1 orme the L
It sho ition ions ,
nigh e pred ble spond i sion ans of e c o n t appea
les") ing wh
hana t ional ut re cation as th
eleraen apable nsiona ithms r ( ext UN Ah 8
uld of tn the q er-le lea t e types ng t and proe
ain s I to
by ether , th II (t leren oil
e ass t . " T eith
1 mod or t
ensio ys tern
be e pr uery vel cal
o o t exte edur ilie the
en sod
is hat ce nf er ert i hus er o e ) o heor nal ) . T
pointed imitive Iangua
program eulus . f infe he phi nsion . es, a on?" e a Individ umerati ium has same q is by r to the ence ru on "t,ve tiie exp f d i r e c r manip em prov mode o
his giv
ou fun
ge c tning Thi
rene loso Fir que
n be uals
ng bee
uest efer obj
les ry s ress t ex ulat ers f in es r
t t. ct io an b Ian
s gi e f pner st, stio ans den
tne n f o ion ence ects to 0 ampl ions ecut ion { int fere ise
hat ns an e vi guage ves r or a 'a d beeau n su wered oted indi
und i coul to deno
ther e con in
ion a by ra en t io nee i to so
by v a pre ewed and
ise t nswer istin se of eh a exte
by t viaua n eac d hav its ted ) ( inte tains tne gains echan nal m s act me 11
irtue dicate simul
as an 0 two ing q ct ion its d
s "Do Hsiona he el 1 sam h one . e been meanin by mea ntiona some
query t the ical ode ) , ual ly mi tati
of s as tane exte diff uest
be ef in es
111 ass pies
On ans
gs ns o 1) amou Ian
data infe Onl use
ons
this LISP
ously nsion erent ions , tween ition every (that name and the
wered alone f the facts nt of guage base
rence y the d in (e.g.
k2
r
tibü heport No. 3067 bolt beranok ana Newman Inc
it is not possible to prove most assertions about infinite sets in extensional mode), but is very el'ficient Tor a variety of question-answering applications.
Semantic Interpretation
i n
r r i i i i
repre and m meani left are seman of f inter one seman struc to t n inter if th of a const tne per f o the wnole is op
Havi sent akin ngs with assi tic orma pret whic tic ture e sy pret e in
co itue high rm t inte
), a erat
ng n th
g s of the
gned inte 1 s er o h h expr
to stem atio terp nst i nt n er he e rpre nd t ed i
ow spe e mea ure t tne speci to s
rpreta emanti perate as be ession indie
. In n of retati tuent ode is noae ntire tation nis is n the
cified nings hat we express ficatic entence tion, a c inte s on a en con s in th ate the LUwAh t nodes
on of a node , perl" or
is cow semanti
of t the no
LUNAR s
the no of Engl
under ions i n of th a. Thi no in L rpretat syntact structe e notat "meani
his pro can be node r then
med bef pleted, c inter he top rmal mo ystera.
tation in whicn we will ish sentences in our system stand the nature of the n tnat notation, we are now e process whereby meanings s process is referred to as ÜNAR it is driven by a set ion rules. The semantic ic structure or fragment of d by the parser, assigning ion to the nodes of this ngs" of tnose constructions cedure is such that the initiated in any order, but equires the interpretation the interpretation of that ore the interpretation of
Thus, it is possible to pretation by calling for
node (the sentence as a de in whicu the interpreter
Semantic hi'les
In of 1 n f sentence constitu the sent syntacti = " c o n t a i>10046 determin ( W o t e t procedur "acciden the same we have
dete orma
co ents ence c s in"; is e hat e in t" as inte
rrain t ion nstr
, "S true obj
a s the the the
of the rpre
ing ar
ucti For 1004 ture ec t ampl
in pr
ret mnem bngl ted .
tne m e us on a exam
6 con of t
= sil e and terpr edica rieva onic ish w )
earn eo - nd pie, tain he s icon sil
etat te 1 co des
ord
ng o
- sy sem in
3 3 ente
) Pi icon ion CÜrt'I mpon ign "con
f a c n t a c t antic inter ilico nee ( us th is a (Cu
AIN ent a that tain"
onstr ic i
in pret i n," sub je e sem chem
wTAIN here nd it its n in t
uction nforma format ng the it is et = S antic ical e S1004 is th is on
ame ha he sen
, tw tion ion mea bo
1004 fact leme 6 SI e na iy ppen tenc
o types about about
ning of th the 6; verb s that nt that LICÜN). me of a by the s to be e that
In LUNAh, this information about the semantic interpretations of syntactic structures is embodied in
semantic rules consisting of pattens that determine whether
1+3
BBN Report No. 3067 bolt Beranek and Newman Inc-
a rule can apply and actions that specify how the semantic interpretation is to be constructed. An example of such a rule is given in Figure 16.
(S: SAMPLE-CONTAIN
(S.NPCMEMl (SAMPLE)))
(S.V (OR (EQU 1 HAVE)
(EQU) CONTAIN)))
(S.OBJ (MEM 1 (ELEMENT OXIDE ISOTOPE)))
(PRED (CONTAIN(#1 1)(#3 1))))
f'igure 16: A Sample Semantic Interpretation Rule
The name of tne rule is 5: bAi'iPLt-CuNTAiN, and the left-hand side, or pattern part of tne rule, consists of tnree templates wnich match fragments ot syntactic structure. The first template requires that the sentence being interpreted nave a subject noun phrase which is a member of the semantic class SAHPLfc, the second requires that tne verb be either "nave" or "contain", and the third requires a direct object wnich is either a chemical element, an oxiae or an isotope. The terms S.NP, S.V ana S.UbJ name schemata for tree fragments which are used not only to test for the presence of their corresponding syntactic structures in the sentence, but also to associate reference numbers with selected nodes in the structure. These numbers are usea for reference by the semantic conditions in tne templates and for use in the right-hand side of tne semantic rule, for example, the tree fragment S.NP locates the subject noun phrase of the sentence and associates tne reference number 1 with that noun phrase .
The right-hand side, or action part, of the rule follows the right arrow and specifies tnat the interpretation of this node is to be a predicate formed by inserting tne interpretations of two constituent nodes into the schema (CONTAIN (# 1 1)(# 3 1)). where the expressions (# m n) refer to tne interpretation of the node with reference number n for template number m in the match of the left-hand side of tne rule.
kk
bfaN Heport No. 306? Bolt Beranek and Newman Inc
Organization of Rules
The semantic rules for interpreting sentences are usually governed by the verb of the sentence. That is, out of the entire set of semantic rules, only a relatively small number of them can possibly apply to a given sentence because of the verb mentioned in the rule. Similarly the rules which interpret noun phrases are governed by the head noun of the noun phr^'e. For this reason, the semantic rules in LÜNAH are indexed according to the heads of the constructions to which they could apply ar> , recoraed in the dictionary entry for tne head words. bach rule then characterizes a syntactic/semantic environment in which a word can occur and specifies its interpretation in that environment. The templates of a verb rule thus describe the necessary and sufficient constituents and semantic restrictions in order for the verb to be meaningful. Nouns in noun phrases benave similarly. That is, tne semantic rules not only specify the process of interpretation which assigns semantic representations, but their left-nand sides also specify tne conditions under wnich given words and constructions are meaningful.
Semantic ;ales in General
resp gene sema deta as t seve the and inte whic othe phil even are dive this
The ects ral a nt ic ils o ne de ral d reade Nash
resti h hav r com osopn tuall to be rsity pres
above for great rules
i' o p e r sired iffere r is r -webbe ng is e not puter ers t y hav
faci of
entat i
presen the s er var
of t ation oenavi nt way eferre r ( 19 sues been e system nan c e to le at these on.
t a t i o a ke o iety ne LU that or wh s.) r d to 72). in t xplor whic
oraput be h und
issue
n is ov f expos of devi NA« sys we will en a te or more woods (
There he sem ed in n are c er sei anüled erstand s, howe
ersi itor ces tera , not
mpla det 1967
ar anti the urre enti by c ing ver ,
mplif y bre that and cons
te or ails ) and e a es o LUNA
ntly sts omput huma is b
led i vity. are tnere ider a ru
on t to N
lso f nat R sy more but er sy n la eyond
n a Th
used are
here le m •lese oods man
ural stem the whi
stem ngua the
numb ere
in num
■
atch is
, Ka y Ian or
doma ch s if ge.
er of is in
the erous (Such es In sues , plan , other guage
any in of will they The
pe of
In many question answering systems semantic interpretation rules are pairea more directly with the syntactic rules of tne grammar so tnat there is little or no template matching required (and consequently less latitude for producing semantic interpretations that are not in node-for-node correspondence witn the syntactic structure). In still otner systems, the semantics are not formalized in rules, but are simply embodieo in arbitrary computer
programs (and consequently totally unconstrained in what
^5
faBN Report No. 3067 bolt beranek and Newman Inc
could be done theoretically but providing little or no theory or conceptual framework for what is going on.) However, the kind of semantic rules tnat are used in LUNAR can be used as formal models to explain what is going 01 in the semantics of these other systems in which the semantics is either more restricted or less formalized.
Semantic Judgments
As judgmen informa represe reject we ha structu to a capabil
compone semar.t i underst of the what as are m e a
in tal tion ntat anom ve ral sent ity ..fee
nt, cs c and i sem
serab ning
tne ana
is ions alous descr aspec ence is
«■ k. ^ « ^ iic r
howe v an do ng, antic lages ful .
ca a
u of or
i bed t — and
nece 11.
er, whi
As w int of
se stru sed the
sema so
now wh
ssar i 5
ther ch a e po erpr synt
of s ctura
bot mea
nt ica far
to a at r y fo te A l
e ar re pa inted etati act ic
ynta 1 a n ning lly
ha ssig epre r a
ur e a rtic out
on r str
x, se spect. to s ot ill-fo s mos n a se sentat ny la speec numb
ularly above
ules c ucture
mant i Th
const tne
rmed tly mant i ion nguag n , I er o
irapo , the an be s and
cs ha at is ruct senten senten dealt c repr to ass e una n tne t thi rtant
patt used lexi
s both a , semantic semantic
ces and to ces. What with the
esentation ign. This erstanding judgmental ngs which for speech ern parts to specify cal words
In tne next few sections, wnat 1 would like to do is briefly survey the uses of semantic information which have been made in various question answering syst'ems using the notion of semantic interpretation rules as presented above to unify the aiscussion. I shall no longer be directly concernea with the use of the rules for the assignment of semantic interpretations to sentences, but with the ancillary use of tne information emboaied in these rules i'or other purposes .
one sema poss cont sent coas Chic sema do spee give mean alte alte
Sema nted nt ica ible ext ence t cit ago" ntic not ch un n i ingf u rnat i rnat i
ntic 1
ny par
of sue
y to mod inte nave ders nter 1 i ve
ve s
info angua meani sings airli h as Chic
ifies rpret
any tandi preta s cr pars
egmen
rmat ge ngfu
of ne "Doe ago"
fl at io
ru ng, tion itic ings tati
ion i under
1 par a s
t light s Am er we c
ight n rule les to this a
of al no , bu
ons of
s us stand sings e n t e n
sch ican an t ana s for inte
bilit a
t on t a
the
ed ing
fr ce , eaul have ell not fli
rpre y to sen
iy ISO
inpu
in a sys
om a t'or
es i a fi tnat city
ghts t cit
det tence for for
t si
num terns raong
exa n in ght f
the bee
to pi ies t ermin
is choos choo
gnal
ber to
all mple , terpr rora s
phr ause aces o pla e wh sema
ing sing
into
of se
of in
etin ome ase we
whil ces, ethe ntic bet bet
wo
text lect the tne
g a east "to
have e we
In r a ally ween ween
rds,
14 6
ütiU Heport No. 3067 bolt Beranek and Newman Inc
In the next few sections I will discuss some of the techniques that nave oeen used in various question answering systems to use semantic information for this judgmental role and discuss their advantages and limitations for speech understanding applications.
Semantic Selectional hestrictions
0 i: i.
i:
As we mentioned above, the att difference between semantically those wnicn are semantically anoma concern of many linguistic seman Fooor, 1964). The device which attempts is a notion of restrictions -- restrictions betwee and semantic features of the a sensibly take. For example, the re "intend" require higher animate su the oddness of sentences sucn as "t there." This account assumes tnat can be assignee to semantic classes and that there must be "semantic semantic disagreement oetween the v subjects, objects and other argumen is in this area of semantics that t the distinction between syntax a there is usually no difference in implementation of such semantic semantically anomalous sentences syntactic restrictions such as n syntactically incorrect sentenc restricted and fixed domains of dis implement such semantic select subcategorizing the syntactic categ classes like 'animate noun' and than simply noun or adjective. One testing of semantic selectional grammar ana avoias the need for testing semantic selectional restri
empt to cha well-formed lous has ticists (se is used
semantic n the verbs rguments w striction t bjects is u he rock in the nouns o sucn as "h
agreement" erb of a se ts which it he nisconc nd semantic
principle restrictio and imple umber agree es. For course , it ional res ories of th color adje thereby in restrictio
any spscial ctions .
racteri senten
been a e e.g. in mos
sele of a s
hich t hat ver sed to tends f the 1 igher a or at 1 ntence can ta
e p t i o n s s arise
betwe ns to mentati ment to
suff i is poss trictio e gramm ctive ' corpora ns int mechan
ze the c e s and
major Katz &
t such ctional entence hey can bs like explain to sit anguage n i m a t e " east no ana the ke. It
about , since en the reject
on of reject
ciently ible to ns by ar with rather
tes the o the ism for
t
r
synt effe It easy task of t sele or s most true
The actic ctivel has t to i
s. ho he maj ctiona emanti
such whe
tech ca
y in ne mple weve or 1 re c we
con nth
niq u tego lim
aa va ment r, o inad s tri 11-f diti e se
e of ries ited ntage
for ne sh equac ction orraed ons a ntenc
seman of th speech of be s u f f i
ould un ies is s as pr ness is re requ e is a
tiei? e g
an ing cien ders
th ereq not
ired ques
ily ramm aers effi tiy tand at uisi
qu onl
tion
sub ar tand cien sim its
the tes i te y fo or
cate has ing t in pie lim use
for cor
r a when
gorizi been appli execu under
i t a t i o of
gramma rect . senten it as
ng the applied
cations. tion and standing ns. One semantic ticality
hather ce to be serts a
hi
SSkwr ■v^^
bBN Report No, 3067 Bolt Berane'< and Newman Inc.
negative possibility, then semantic selectional restrictions may be violated by perfectly reasonable sentences. A speech understanding system which contains sucn restrictions embedded in its grammar will fail to parse such inputs. (For example, in Terry Winograd's blocks world program the sentence "Can a table like blocks?" fails to parse since the system applies the selectional restriction that "like" requires an animate subject.) A speech understanding system which used such selectional restrictions as a prerequisite for acceptability of an interpretation of a speech signal would be unable to "hear" this sequence of words no matter how well articulated and how successful the acoustic and phonological analysis, but would rather insist on looking for some other interpretation of the signal.
An additional limitation of the semantic selectional restriction approach is that the necessary semantic information associated with a given argument to a verb is not necessarily associated with the lexical items in the noun pnrase, but may be associated with the referent of the noun pnrase instead. The association of such information with the dictionary entries for the words is really just an approxiiüatiün (alueit a usofui one for many applications) or what one really wants the semantic selectional restrictions to test ,
A major practical difficulty with incorporating the semantic selectional restrictions into the syntactic categories ot the grammar is the lack of extendability thus induced. If one wants to apply the system to a different domain of discourse or to extend the domain slightly, he has to redefine the categories of the grammar.
Semantic Screening
A sornewnat more versatile technique for using semantic information to select an appropriate parsing is to apply semantic rules to the nodes of the syntactic tree structure as tne noaes are ouilt by tne parser. If the node just constructed fails to have a semantic interpretation, then ♦•not- nnmn,,* a* inn natin r,r the parser is rejected and the
~ input. This c e s a s
-"-- ■ rr-=^^=-
bBN Heport No. 3067 Bolt Beranek and Newman Inc
have been con;-i..ued further. This argument, however, neglects to count the cost of the semantic interpretation on uncompleted parsings which would not have been completed in any case for syntactic reasons. Whether semantic screening really provides an increase in efficiency depends on the relative costs of the extra or unnecessary semantic processing and the syntactic processing that is thereby eliminated. In many situations, it is more efficient to complete the syntactic analyses and then apply the semantic testing.
sere well with exam abou basi curr gene This its to thin
Anot ening -form
tne pie i t "p s of ent ral i
tec exclu say gs th
her t is
edness form
n «lino ut th w n e t h e state nforma hnique sive a things at wer
echn to but
atio grad e p r th of
t ion ca
nd u th
e no
ique apply also
n of a 's sys yrarnid ere is tne about
n be ncontr at we t true
which test
tests cons
tem on t a
world whet
very oiled re no
is s not of f
titue when he bl pyram
and her p usef u use
t air
re onl
actu nt. he
ock id not
yram 1 in woul eady
late y of alit Thi mak
in t on jus
ids som
d .'»a tru
d gen
y in 3 is es he c a b t on can e si ke i e or
to eral
cOn the
his orner lock the
be on tuati t im to a
sema sema June case deoi " on in
basi bio
ons, poss ska
ntic ntic tign for
sion the the
s of cks . but
ible bout
Semantic Selection
sere sele well a^, " oher poss poss when like that kitid prob bell requ sema sele synt of t been in t info in?t
pref
A ma ening ctiona -forme 1 saw e are ible, ible I saw
ly in would of
ably u eve ot ired nticai ct the actica his pr made
he LUN rmatio rument
er th
jor and
1 re dness the m many but a that the
terpr indi
def au sed t herwi in g .ly i most
lly oblem in a AR pa n su and
e al
inadeq ind
strict is it
an in possi
re not 1 wa
man so etatio cate t It in o see se the eneral ll-for plaus
relate in ge
mechan rser ( ch as one ca
ternat
uacy eed ions s in tne Die equ
s in mewn n i his terp the man ra
med ible d a nera ism see the n se
ive
of of as
abil park pars ally a p
ere n a inte reta man , was
ther int int
Iter 1 is call Wood fact e wi
of
se any
s ity wit ings
Pi ark else bsen rpre t ion and pro th
erpr erpr nati not
ed s s, 1 tha
th
"wi
mant ap
trie to d h a whi
ausl whit; , t ce tati
th in
babl an etat etat ves ,
a*. elec 973a t a an
th
ic plic t eai
te ch a ble . h co his jf s on . at abse y in a m ions ion
Al hand tive ). tele opti
a t
( and ation prere wit h lesco re al
Al ntain is
p e c i f Rath
the nee the
ere is
from thoug
, a modi
This scope cal
elesc
of of
quis sent pe" 1 se thou ea a not ic i e^ t tele of park re je a me amon 1? th begi f ier mech is
inst n
factual) semant ic
iteü for ences such in which
mantically gh it is telescope the most
nforraation here is a scope was reason to
what is ction of chanism to g a set of e solution nning has placement
anism uses an optical rument to
ope" modifying
h9
bBN Report No. 3067 bolt Beranek and Newman Inc
"see", while in absence of semantic preference, the modifier "in the park" modifies the syntactically preceding noun phrase "man". The technique has not been systematically developed, however, and except for the placement of prepct ^ional phrase modifiers, the use of semantic judgments in LUNAR to select among alternative parsings is not well developed.
Semantic Prediction
All oi t e preceding techniques for making semantic judgments about completed syntactic constructions are of grert importance for speech understanding. There nre, however, situations in the course of understanding a speech utterance where one does not have a complete construction to work witn and would like to make use of semantic information to guide the speech understander to look for words which mignt have been slightly garbled or to provide initial prefprences among the words that are discovered on the basis of acoustic and lexical analyses alone. Given for example that we have found tne words "sample" and "contain" in a speech signal, we would like to make use of our semantic information to predict tnat there should now occur a word which is a chemical element, an oxide or an isotope. This information is contained in our semantic rules (specifically it is in tne left-hand sides of the rule*). Similarly upon encountering the words "sample" and "contain" among a large number of other words in the initial word lattice, we would like to use the semantic information to notice that these two words are related and perhaps go together in the interpretation of the utterance. botn of tnese semantic roles make use, not of the logical or interpretative sense of semantics, but of a kind of associational semantics which studies the semantic relationships among woras and concepts. There are a number ot psychologists and psycholinguists as well as peoplo in artificial intelligence, sociology and other field who have been trying tt model this aspect of semantics with various kinds of network structures. The initial impetus in this area was created by Rosd Quillian (I960, 1969), but other researchers in this area of semantics includ.. Abelson, Carbonell, Collins, Rumelhart and Norman, Schänk, Simmons, and others (a sampling of most ^f these authors is given in Schänk & Colby, 1973 and others are cited explicitly in tnis paper.) The work of Fillmore (1968) has also been influential in this area of study, and recently, similar notions have been used at HIT as the basis for programs that analyze visual scenes (winston, 1970). I will describe here some of the characteristics of semantic networks as Quillian visualized them which have direct application in speech understanding and which have been included in the BBN speech understanding system.
50
■w . übN Heport No. 3067 bolt Beranek and Newman Inc.
4
I I r
i i i i i
as erro Hath coll (wit was torn- be i atte nota many rais
Qui cha
neou er e c t '. hout mean ulat nade nt ic tion
of ed .
llian racte sly) he v on of , ho t by ion quate n to itse the
was rizi tne iewe the
wevt a co and in
a sp If,
po
not i ng t psycho d the
cone r, gi ncept) much the
ecific out th ints t
nter ruth logi
"m epts ving
of t resp atio at d hat
ested I
cal r eanin
tha any
I co he wo ect n of oesn ' he an
in
ndee elev
g" t *. adpq nsi'i rk t that the t 1 d ot
the d h ance Of re uate er hat
it sema esse hers
not io e de
of a wo assoc expl
Quill it ha
doe ntics n th of t
ns o nieci sue
rd :' ate icat ian ' s st sn ' t of
e v his
f sema (I
h not as mer d wit ion of s ori imulat
give the ne a 1 i d i t school
ntics tnink ions. ely a h it what
ginal ed to
any twork y of have
Quillian wa's concerned with investigating t in which humans tore information i'i their br the so called semantic networks are really finding structures and organizations for storin His concern is not with having a notation in whi down a list of facts, but rather with an ov structure in which the interrelationships among which humans use for retrieval of informat construction of inferen. es, are explicitly and represented. The important thing for Quilli much the structure of a particular concept, but of relations to otner concepts that are esta particular, Quillian sought to devise a mecha structure which v ould account for the types associations which people make ana tne associations manifest themselves in huma unaerstanding.
he structure ains. Thus, attempts at g knowledge, ch to write erall memory those facts, ion and for efficiently
an is not so the network blished. In nism and a of semantic way these
n language
nad an ex "die "cone diffe assem netwo PRÜFE the is th it i very seman condi seman seman under ir ^er under
To give a in raind , F ample ot t nt" . epts" rent blage rk.
bac or n sens of
In SSIOHAL st network. e sura tota s connect little lev tic inte tions , it tic predi tically re standing . section c standing.
flavor igure he con h lex odes i es o t point
Figure and to In Qu
1 of t ed -- orage rpreta is a s ctions lated In pa
an pi
of 17 ( cept ical n th
tn ers 17
r po iili he c no on tion uper
an word rtic ay
the ki taken assoc i tem
e sema e wor to o
the id inters an 's v oilect more a solvin
or b mech d not s tha ular , an i
nd or
from Q iated or wor ntic d ) eac ther entifi to ot
lew, t ion of nd no g any
cnar a n i s m icing t are Quilli raporta
netwo uiiii witn d poi net h of conce ers P her he me cone less .
of acter for the req
an 's nt
rK tha an, (19 the le nts to (corres wnich i pt nod bhSDN, concept aning o ept nod
while the p
izat ion accompl coincid uired notion role
t Quill:--n 69)) , give., xical it&m one or more ponding to s merely an e- in the EMPLOr, and
nodes in f a concept es to which this gives
roblems of of truth
ishing the ences among for speech of semantic in speech
51
bbN Heport No, 3067 Bolt Beranek and Newman Inc
DICTIONARY QUILLIAN
rigure 17: A Fragment or a Quiliian Network
52
öbN Heport No 3067 bolt beranek and Newman Inc
Semantic Intersection
■
I I I E I I
Quiilian developed tne notion of' semantic intersection as an attempt to account tor tne human capability to immediately identify the relationsnips between diverse things such as between 'plant' and 'alive' or (more subtly) between Madrid and Mexico, and to account for the tendency of people to accept an ambiguous term in a particular sense induced by the appropriateness to the context without noticing the other possible senses (a phenomenon called "foregrounding"). In foregrounding, the appropriate sense is somehow brought forward and made more accessible than the other senses due to the influence of the context. Tt-.t. mechanism which Quiilian proposed to account for such phenomena and which he believed was the principal process for accessing information from one's knowledge store was a process which he called semantic intersection. Quiilian assumed tnat in the brain, whenever a concept was brought into consideration in a discourse or wnatever, it was somehow stimulated or "activated" and tnat this activation passed out in waves from the source of tne stimulation to the concept noaes to wnicn it was connected. When the activation waves from two different sources met at some node in the memory, a semantic intersection was detected, and a path through tne semantic memory was tnereby established which represented the semantic relationship between the two source concepts. (e.g. tnadrid is in Spain which is like Hexico in language and culture.) Similarly, such activations have some auration in time, and wr.en an ambiguous word is encountered, tne sense that people are likely to take is the sense which has semantic connections with concepts that are currently activated (as detected by tne presence of semantic intersections) .
sema that ana rela thro word are de ta of Nash we rule to wner clas head netw cone
In nt ic one can
ted ugh s th suff ils cont -web have s of nave eas ses of
ork epts
s peech inter
hears be us
words i the s
at have icient1 on the 1 n u o u s ber (19 in the LUNAh in s
in LUNA is av
a const format invoiv
understanding , sections can in an otnerwis ed to detect t n a wora la emantic netwo not been dete
y li kely that use of such te speecn, the
74, 1975»). N pattern parts
is one type of uch a semant R tne informat ailable conve ruet ion, simil
would be equ ed in the rule
this foregrounding effect of be used to influence the words
e ambiguous segment of speech, he coincidences of semantically ttice. Following connections rk can also be used to predict cted in the signal but which tney should be looked for. For chniques in the understanding reader is again referred to
otice that the information that of tne semantic interpretation information that we would like
ic network. Notice also that ion about associated semantic niently if one starts with the ar information in a semantic ally accessible from any of the
This is one more instance of
53
bBN heport No. 3067 Bolt beraneK and Newman Inc
the importance of breaking a priori orderings of processing in speecn understanding in favor of multiple, redundant ways of achieving the same result. In any given utterance, it could be one of the critical head words that is garoled, and one would like to be able nevertheless to find the semantic relationships among the arguments and use tiicin to predict the missing head.
Other Aspects of Semantic Nets and Knowledge Representation
semant Raphae about a c h a i them is a t may t at the proper wnich Tnese a b s e n c over are tr
noth ic 1's a co n of supe ype hus lev
ties are wou
e of and ue .
er n nctw SIR ncep mor
rcon of a hav
el o th
stor Id con
over
ot ion ork ( system t can e and ceptsJ nimai e cert f c a n a at ar ed at be au trary aga in
embe whic
( Ra be s more
r' whic ain ry ( e c tne totna info lor
dded hals phael tored inci
or ex n is prope such cramon most t icai rrnat i eacn
in Q o ha , 19 at
us.i v ampl in t rt ie as to
gene ly i on ) of
u i 11 i a n ' s rudirae 64 )) is several e concep f,, a can u-n a ph s wnicn being a grea
ral leve nnerited without the enti
s con ntary
tha diffe ts (g ary i ysica are s yel lo t man 1 of oy a
havin ties
cept beg
t i rent uill s a 1 ob tore w) y co appl ubco g to for
ion o inning nforma level
ian ca bird w ject. a dire but o ncepts ica bil ncepts be SL
whicn
f a s in t ion s up lied hich
it ctly ther ana
ity . (in
ored they
Tnere is a t various semanti structures snould inferences, what network in respon particular, it students, that a response to a assumptions that in memory and no For example, Sena like an ice cr second utterance answer to the qu when one attempts judgments about interpretation of semantic inferen factual knowledge the speaker is of paramount impo given interpretat it to wnat has be current context
remendous amount oi interest rig c networK representations, look like, now they shoula be u kincs of tnings should be
se to understanding a sentence, is pointed out, notably by Sch great deal of wnat is und
n input sentence comes from are maae on the basis of knowled t specifically transmitted by tn nk cites dialog pairs such as eam cone?" ana "1 just ate", i should oe interpreted as giving estion. 1 tnink it snould be ap to understand spoken discourses the contextual appropriateness an utterance, the ability to
ces using large amounts of s (as well as pragmatic knowledge
likely to say in a given situati rtance, The inability to acco ion ot an utterance by being abl en said before or to some aspe snould raise the possibility
ht now in what such
sed to do put into a etc . In
anK and his erstood in gratuitous
ge already e se.itence . "would you n which tne a negative parent that
and make of a given make such
eraantic ano about what
on) will oe unt lor a e to relate ct of the that the
3'*
D MD bBN Heport No. 3067 bolt Beranek and Kewman Inc.
E i:
utterance has been misheard. The ability to fully use this level of sopnisticated inference as part of a speech understanding system, however, will probably have to await further developments in the ongoing studies in knowledge representation and mechanical inference. The techniques which exist today in these areas are either extremely limited or inordinately cumbersome.
• ■
■
I«
1 I I I I
55
bUN Keport No. 306? bolt beranek and Newman Inc
CUNCLUSIUN
I have attempted here to provide a perspective on some of the work that has been done in the areas of syntax and semantics for understanding natural language by machines and to call special attention to those techniques which have particular relevance to the problems of speech understanding -
I have tried algorithms and advantages and dis these models for will encounter in tried to give my o features. In part predetermined orde across the sentenc be avoided. 1 syntactic word cia ambiguity in hng understanding by t the word at a gi one at least knows expectation of t for it, in speech alternative possi or more possibl combinatorial prob possible alternatl This is coraplicat that nave oeen dev impact of tnes carefully designed conflict with tn orderings are se analysis of tne speech.
to cover grammar
advanta^as the parti
analyzing pinions as icular, I r of findi ^) is pote have poin ss, whicn lish text he inabili ven positi what tne
wo or thr understand Die words e syntac lems that ve analyse ea by tne eloped in e combin sequences
e above o nsitive t input th
a r mod of
cula cont to
have ng t nt ia ted is o ) is ty t on i word ee p ing at
tic ar is s i fact text ator of
bser o t at
ange els
th r ty inuo tne arg
hing lly out
ne o gre
o u s , is
ossi we ra a gi
ca e fr s m tna par
iai iOO
vat i he are
of wit
e v pes us s valu ued s ( s dang
th f th atly niqu wher and
ole ay ven tego om t ucn t mo sers
po king on t erro vir
diff h emp arious of pro peech , e of t that t uch as erous at tn e ma jo magni
ely d eas in
t, her syntac nave point, ries . he mul worse
st of for m
ssioil for
hat su rs in tually
erent h a s i s
feat olems
and nese d ne us left
and pe e ambi r sou fled i etermi text
e f o r e tic ca a hal each
Hen ti piic
for tne te i n i rn i i; i ties thing
ch con tne
inevi
par on
ures that I
iffe e o to r rhap guit rces n sp ne pars has
tego f d with ce at io spe
chni ing req
s w stra lex
tabl
sing the of
one have rent f a ight s to y of
of eech what ing,
an ries ozen one the
n of ech . ques the
u ;L r e hich ined icai e in
The use of word lattices as input instead of sequences and the desigr. of parsing algorithms around well-formed substring tables or charts appear to be viable metnods for dealing with the comoinatorial problem of speech understanding. The merging of common parts of different analyses permitted Dy transition network grammars is also helpful in this respect. In order to be able to correct errors, it will be essential to be able to come at a given parsing from several directions. Consequently checks will be necessary at appropriate points to avoid duplicating an analysis that has already been found.
Another important role of syntax in a speech
understanding system is the prediction of those places where
36
i; * ■
bBN heport No. 3067 bolt beraneK and Newman Inc
1. n ü
small function words might occur in order to compensate for tne unreliability of their identification by lexical analysis .
0 I. i -
adva comp lang grea mach the out use unli of sema betw tne of s rese to i unae alte
Alt need lete uage t be ines spec in of
kely sema nt ic een inpu ome arch ncre rsta rnat
noug as
), t un
n e f i . 1 i f ic res sem in
nt ic i
sema t . oft
in ase nü i ve
h ou th
nere ders t in hese at io pons ant i terp
as nter nt .1 c une nese tne
t ne ana inte
r un at are
tand tne inc
n of e t c s reta soci sect ally snou
te are
rang the
rpre
ders of a n
ing con
lüde tne
o th elec t ion at io ion rel
Id b enni as b e o ir a tat i
tand synt umbe pro
stru the ope
e un tion s o ns tec
ated e aw ques otn { t bill ons
ing ax r of gram ctio use
rat i ders al f th as nniq wor
are , an
of s tiing t ies of a
of seraan (which
s e m a n t s have n of s of proc
ons wnic tanding restrict e speech embodied ue to ds at a however
a the yntax an s wnich to cnoo signal .
tics itse ic use
peec edur h a r of t ions sig in
noti if fe , of need a se
su se c
13 If tech d wn n u al s e to he s
to rial ,
th ce rent tne fo
rnant ch orre
not is niqu ich nder eman oe
ente ru
and e coin
po lim
r c ics syst ctly
as well far from es that can have standing tics for carried
nee, the le out the use
Qulllian cidences ints in itations ont inuea in order ems can between
mater issue the inter you are g psych and s under persp roles belie a 1 m o s seman a u torn
1 thin iai t s rath re fere ested some f oing ology , emant i stand i ec t ive
of v e t n a t as tics a at ic s
K it hat er s nces read eel i on
an cs a ng tna
synt t t gre
s t n peec
is
1 hail
wi er t ng f in u a no s tasK t ti.
ax a he at ese n re
clea have owiy 11 o fo or t comp rtif ome
fo e sp nd s spee an area cogn
r tn it
and prov How ne i utat icia oft r t eeen eman en im pa s ar i t io
al i
has oth
ide up.
S3 üfc
iona 1 in ne r hese und
tics unde ct e no n ,
n or bee
ers add
1 s an 1 1 tell ami f
ar erst in
rsta on w ha
der t n nee not it ion hope d som ingui igene icati eas , andin langu naing resea ving
o cover essary at all al det that 1 e of th st ics, e relat ons of Given
g tasK age und
proDl r en in on tne
the to t • ail
na e th lin
i ve tn
the plac erst em
sy pr
scope of reat many riopefully for the
ve given ings that guistics , to syntax e speech di fferent es on the anding, I can have ntax and o b1e m of
I
bbN heport No. 3067 Bolt Beranek and Newman Ino
Meferencea
n L 1]
D 1.
L2]
■ fe
• «-
L
bobrow, D.G. and Fraser, J,b. (1969) "An Augmented State Transition Network Analysis Procedure," Proceedings International Joint Conference on Artificial Intelligence. nashington D.C. , pp. 557-,367.
[3] bruce, b. (1973) "Case Structure Systems," Proceedings of Tniro Internat ional Joint Conference on Artificial Intelligence. Stanford University, Stanford, California, pp. 364-371.
[4] Carboneil, J.ri. anü Collins, A.H. (1973) "Natural Semantics in Artificial Intelligence," iLLOiLg.edi.ngs of Tnird International Joint Conference on Artif icia,! Intelligence. Stanford University, Stanford, California, pp. 344-351.
[5] Cno.-nsky, N. (1965) Aspects of the The^r^ of Syntax , Cambridge, Mass.
MIT Press,
L6j Collins, A.M. and Quiilian, M.R. (1969) "netrieval Time 1rora Semantic Memory," Journal of Veroal Learning and Verbal Behavior, ö (2), pp. 240-247.
L7J Collins, A.M. and Warnock, h.h. (1974) Semantic wetworKs, heport 2b33, bolt Beranek and Newman Inc., Cambridge, Mass.
Ld] Denes, P.b. and Pinson, t.N. (1963) The Spe.ecn Chain, bell Telephone Laboratories, Inc.
L9J barley, J. (1970) "An efficient Context-t'ree Parsing Algorithm," CACM 13 (2), pp. 94-102.
[10] rilimore, C.J. (I960) "The Case for Case," in bacn, t. and Harms, h. (eds.) UuiieCSais in Lingui.3tic Theory. Holt, hinehart and
R-l
bbN Heport No, 306? Bolt beranek and Newman Inc
winston, New York,
[11] Floyd, K.M. (1967) "Nondeterrainistic Algorithms," JAOi 1^ pp. 636-6^4,
(4)
[12] Green, C.C. and Raphael, b. (I960) • Tue Use of Theorem-Proving Techniques in Question-Answering Systems," Proc . 12.68 ACM National Conference . pp, 169-Id 1.
L13J Greibach , S.A. ( 1967 ) "A Simple Proof of the Standard-form Theorem for Context-tree Grammars," in Mathematical Lin£uisti£S
and Automatic Translation, Report NSF-18, Harvard University Computation Laboratory, Cambridge, Mass.
Ll^j Griffiths, T. ana Petrick, S,h. (1965) "ün the Relative Lfficiencies of Context-Free Grammar Recognizers," CACii 5 (Ö), pp. 289-300.
L 1:J J Hays , D.G. ( 1962 ) "Automatic Language-Data Processing," in Harold borno (ed.). Computer Ä£2li.cations in the öehaviou,ral Sciences, Prentice hail, Englewooa Cliffs, New Jersey.
[16] Heidorn, G.t. ( 1972) Natural Language Inputs to a Simulation Programming System, Ph.D. ihesis, Yale university. New Haven, Conn,
[17] Jakobson, R., Fant, CG,, and »alle H. (1967) Prsliminaries to S£eech Analysis. HIT Press, Cambridge, Mass,
Lid] Katz, J,J, and J . A- ( 1964 ) a Semantic iheory," in
kü v> % ) o 4 o . d ii u Fodor, "The Structure ol
and Fodor, J.A. (els.) The otrujture of Language: Readings in the Philosophy of Language, Prentice- Hall, Englewood Cliffs - ^ " --" •"- • ' :-
Katz , J . J
.anguagi . ... ..->L .'rentic«
New Jersey, pp. J479-5l8
L 19J Kay, rt. ( 1967) "Experiments with a Powerful Parser," Memorandum, RM-5452-PR, The RAND Corporation, Santa Monica, California,
[20] Kuno, 5, and üettinger, A,G, (1963) "Multiple-Patn Syntactic Analyzer," Information Processing o2, North-Hollana ,
L R-2
D D
bBN Heport No. 3067 Bolt Beranek and Newman Inc
D D L I
«■
I I I
Amsterdam, pp. 306-312.
[21] Na3h-*ebbcr, B. (1974) "Semantic Support for a Speech Understanding System," Proceedings of ihhh Symposium on S£eech Hecognition, Carnegie-Mellon University, Pittsburgh, Penn., pp. 2i4'4-249.
122] Nasn-rtebber, b. (1975,) "The hole or Semantics in Automatic Speech Understand- ing", in Representation and Understanding. bobrow, D.G. and Collins, A. (eds.) Academic Press, (in press
[23] wewell , A. et al . ( 1973) Speech Understanoing Systems: tlinal heport of a Study Group. North-Holland/American Elsevier, Amsterdam.
124] Norman, D.A. and humelhart, D.b. (1973) "Active Semantic Networks as a Model of Human Memory," proc . Third International icrint Conference on A.!lLilliS.ii.i. illi.eLLLfien£.§.» Stanford University, Stanford, California, ^p. 450-463.
[25] Petrick, S.H. (1965) A hecognition Procedure for Transformational Grammars, Ph.D. Thesis, Depirtment of Modern Languages, M.I.T., Cambridg";, Mass.
[26] Quillian, rt.R. ( 196b) "Semantic ciemory," in Minsky, M.L. (ed.) Semantic Information Proc.cs.si.rg,, Mil Press, Cambridge, Mass .
[27] Wuiliian, h.h. ( 1969) "The Teachable Language Coraprehender: A Simulation Program ana Tneory ol Language,'1 CACn 12 (8), pp. 459-476.
[2bJ haphael , B. ( 1961) "A Computer Program which 'Understands'," AFIPS Conference Proceedings, Vol. 26 (1964 FJCC) pp. 577-569."
L29] Sandewall, K. (1971) "A Programming Tool for Management of a Predicate-Calculus-üriented Data Base," Procj. Second inte.rnati.onal. J.oint Conference on Artificial Intelligence. The British Computer Society, London, pp. 159-1667
[30] Scnank, h.C. and Colby, K.M. (1973) Computer Models of 1 hought and Language, w.H. Freeman
R-3
BBN Heport No. 3067 belt beranek and Newman Inc
& Co San Francisco, C a 1 i to r n;. a .
L31J Inorne, J., bratiey, P. and üewar, H. (I960) ■The Syntactic Analysis of English by iiacnine," in D. MJ.chie ( ed . ) Na.cnine Intel liSence ^, American tlsevier, New York, N.Y.
L 3^ J "inogrdd , T . ( 197.: j
Uiliierstandin£ Naturai Lan&uaSe, Academic Press. New York, N.Y.
L 33J Winston, P.M. ( 1970 j "Learning Structurax Descriptions from Examples." MAC TR-76, rtIT Project HAU. Cambridge, Mass.
Weeds, W.A. (I960) (i n r'rocyaur-ii Semantic t-or a Quest, i on-Answer ing
Machine", AFIPS ^onfegencg rr^'^ed i ngs , Vol. 33 (1960 Fjccj, pp. iprrnrn":
L35j
L 3u
[37
wooas, «.A, 11967)
oemantics for a Question Answering System, report NSr-19, in* Computatjon Laboratory, harvard University, Cambridge, Mass. (NTIS number Pb-17ö-54d;
wooas , •», A . ( 196 9 )
"Augmented Transition Networks Analysis," rteport C fiarvard urilversity,
or Natural Language -1, eoraputation Laboratory, Cambridge , nass .
woods , «.A. (1970) "Transition i«. e t w o r»; Grammar: Analysis," CÄCH 13 ( 10) , pp
for Natural Language 591-606.
[iCJ wooas, w.A. ( 1 97ja>
"An txperiraentai t'arsint: System Network Grammars," in h, hustin
[40]
. e w >rk ) n t :
for 2'ransition ( ed . ) , Nal.urai s Press A e w Y o r k
L jyJ woods , w.A, (1973b) "Progress in iaturai Language unaerstanding : An Application to Lunar üeoiogy," A f ^ f j Cont erence Proceed in^s . Vol. 42, (1973 National Computer Conference) pp. 441-450.
woods, w.A. (1974)
"Motivation and overview of BBN SPEECriLIS: An experimental Prototype for Speech Understanding Hesearch", LLoc.^ Ltk^ S^mfiosium on S£eecn hecognit.i_on , Carnegie-hellon university. Pittsburgn. Penn.. pp. i-io
i.
i.
BbN Report No. 306? bolt beranek and Newman Inc
[hi] Woods, W.A., Kaplan, R.M. and Nash-Webbei, 3. (1972) "The Lunar Sciences Natural Language Information System: Final Report", BBN Report No. 2378, Bolt Beranek and Newman Inc., Cambridge, Mass. (NTIS number N72-28984 ) .
142] Younger, D.H. ('966) "Context-Free Language Processing in Time n3," Proceedings 12.66 Annual S^raiiosium 2.11 ^witching and Automata Theory. Ihtt Conference Recoro 16 C 40, 1966, pp. 7-20.
•v - - I • -
Y.
R-5