View
214
Download
0
Embed Size (px)
Citation preview
LING 438/538Computational Linguistics
Sandiway Fong
Lecture 26: 11/30
2
Administrivia• 538 Presentation Assignments
First name Last name CHAPTER DATEAhmed Abbasi 19 5thAnastasia Gorbunova 20 5thAndrew Lebovitz 13 30thAndrew Glines 17 5thBojan Durickovic 14 30thEmad Nawfal 15 30thGuoqiang Shan 12 30thJin Wang 11 30thJiyoung Kim 11 30thJon Peoble 13 5thKara Johnson 4 5thLindsay Butler 16 5thMans Hulden res 30thnadia hamrouni 19 5thShannon Bischoff res 30thTianjun Fu 18 30th
3
Administrivia
• 538 Presentation slides due tomorrow midnight in my mailbox
4
Administrivia
• Homework 6 due tonight
• Question 1– No, it’s not a trick question– Yes, there is a difference
• If you didn’t find one, look a little harder...
5
Last Time
• Finished looking at the LR(0) and LR(1)-based bottom-up parsing methods Crucial ideas:
– construction of a finite state machine
– machine states are sets of dotted rules
– transition between states based on the progression of the “dot”
– actions• shift: move the “dot” past a terminal symbol• reduce: “dot” at the end of a rule
– machine uses a stack to keep track of the states
– grammar rules are no longer used directly during the parsing stage
– such a machine can be built automatically (Homework 6 code)
6
Chart Parsing
• another use of the dotted rule notation• a chart is a graph data structure used
to store partial and complete parses• Kay (1980)
• chart can be built using different strategies (e.g. top-down, bottom-up)
• can be guided or combined with Left-Corner Parsing or online or offline dotted rule manipulation:
– e.g. Earley algorithm (online)– (offline) LR machine construction
techniques
• also: partial parses can be recovered
• observation: parses often share common constituents
– cf. dynamic programming
• example– I saw a man ...
– only have to parse.. [PP with a very nice-looking
telescope that I also happened to have bought last Friday]
once– I [VP saw [NP [NP a man][PP ... ]]
NP-attachment– I [VP [VP saw [NP a man]][PP ... ]]
VP-attachment
• standard Prolog backtracking will undo the whole PP
7
Chart Parsing
• “Chart”– graph data structure for
storing partial and complete parses
– graph = (vertices,edges)• vertex
– used to mark the input
• edge (active, inactive)– active edge: denote
incompletely “parsed” rule
– inactive edge: completely “parsed” rule
• dotted rule notation (again)– “dot” (.) indicates the progress
of the parse through a phrase structure rule
• examples– (active) vp --> v . np means we’ve seen v and predict
np– (active) np --> . d np means we’re predicting a d ... to
be followed by a np– (inactive) vp --> vp pp. means we’ve completed a vp
8
Chart Parsing
• Example: (multiple parses stored in the chart)– cf. homework 6
1 I 2 saw 3 a 4 man 5 with 6 a 7 telescope 8
n v d n p d n
npvp
crossing!
9
Chart Parsing: Algorithm
• example: top-down• stage 1: Apply lexical rules
• Prolog representation:– edge(V1, V2, DottedRule).
• DottedRule = LHS --> Seen . Predict– edge(V1, V2, LHS, Seen, Predict).
1 I 2 saw 3 a 4 man 5 with 6 a 7 telescope 8
n v d n p d n
10
Chart Parsing: Algorithm
• example• edge(1,2,n,[],[]).• edge(2,3,v,[],[]).• edge(3,4,d,[],[]).• edge(4,5,n,[],[]).• edge(5,6,p,[],[]).• edge(6,7,d,[],[]).• edge(7,8,n,[],[]).
– inactive edges:• edge(_,_,_,_,[]).
11
Chart Parsing: Algorithm
• Step 2: Predict (top-down):– add active edges beginning with start symbol, e.g. S– example
• s --> . np vp • np --> . d np • np --> . n
• edge(1,1,s,[],[np,vp]).• edge(1,1,np,[],[d,n]).• edge(1,1,np,[],[n]).
– Active edges: • edge(_,_,_,_,L). • L ≠ []
12
Chart Parsing: Algorithm
• Step 3: Fundamental Rule– “advance the dot”– inference role– if
• LHS --> Seen . X StillToSee (active)• X --> RHS . (inactive)
– then• LHS --> Seen X . StillToSee
– note:• assuming of course, the edges line up...
13
Chart Parsing: Algorithm
• Step 3: Fundamental Rule in Prolog– example
•edge(1,1,np,[],[d,n]).•edge(1,2,d,[],[]).
•edge(1,2,np,[d],[n]). (active)
14
Machine Translation
• Background Reading: chapter 21
• Rich Topic: – not covered in much depth there
15
Machine Translation
21.1 Language Differences and Similarities
• Word Order– English: SVO– Japanese: SOV (head-final)
• e.g. postpositions vs. prepositions
• Morphology– agreement
• Spanish is a pro-drop language (omit subject pronouns)
• but missing subject pronoun is recoverable from verb morphology
– causative• a bound morpheme (suffix) in
Japanese• a word in English• cf. kill = cause to die
• conceptual/semantic differences– also lexical gaps
16
Machine Translation
• Tree-to-tree transfer • Interlingua
sourcesentence
source target
targetsentence
sourcesentence
targetsentence
source target
interlingua
17
Machine Translation: Beginnings
c. 1950 (just after WWII)– electronic computers invented for
• numerical analysis• code breaking
• Book (Collection of Papers)• Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press
2003. (Part 1: Historical Perspective)– Weaver, Reifer, Yngve, and Bar-Hillel …
Killer AppsKiller Apps: Language comprehension tasks and : Language comprehension tasks and Machine Translation (MT)Machine Translation (MT)
18
Machine Translation: Beginnings
• Success with computational methods and code-breaking
[Translation. Weaver, W.]• citing Shannon’s work, Weaver asks:
• “If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?”
19
Machine Translation: Beginnings
Statistical Basis Popular in the early days and has undergone a modern revival
The Present Status of Automatic Translation of Languages (Bar-Hillel, 1951)– “I believe this overestimation is a remnant of the time, seven or eight years
ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication”
• Bar-Hillel’s criticisms include– much valuable time spent on gathering statistics– no longer a bottleneck today
20
Machine Translation: Beginnings
• Statistical Basis Popular in the early days and has undergone a modern revival
Statistical Methods and Linguistics (Abney, 1996)
– Chomsky vs. Shannon• Statistics and low (zero) frequency items• Colorless green ideas sleep furiously vs. furiously sleep ideas green colorless• Modern answer: smoothing
• No relation between order of approximation and grammaticality– n-th order approximation reflecting degree of grammaticality as n increases
• Parameter estimation problem is intractable (for humans)– statistical models involve learning or estimating very large number of parameters– “we cannot seriously propose that a child learns the values of 109 parameters in a
childhood lasting only 108 seconds”– see IBM translation reference later (17 million parameters)
21
Machine Translation: Beginnings
(Bar-Hillel, 1951)• Reifer (University of Washington)
– Unbelievably optimistic claims
– Compounding: – “found moreover that only three matching procedures and four matching steps are
necessary to deal effectively with any of these ten types of compounds of any language in which they occur”
– [i.e. we have heuristics that we think work]– “it will not be very long before the remaining linguistic problems in machine translation
will be solved for a number of important languages”
22
Machine Translation: Beginnings
• [Wiener]– “Basic English is the reverse of mechanical and throws upon such words as get a
burden which is much greater than most words carry”
• [Weaver]– Multiple meanings on get yes – but a limited number of two word combinations get up, get over, get back– 2000 words => 4 million two word combinations – not formidable to a “modern” (1947) computer
get is very polysemous WordNet (Miller, 1981) lists 36 senses
23
Statistical Machine Translation
• Re-emergence of the Statistical Basis • Conditions are different now
– Computers 105 times faster– There has been a data revolution
• Gigabytes of storage really cheap• Large, machine-readable corpora readily available for parameter estimation
24
Statistical Machine Translation
• Avoid the explicit construction of linguistically sophisticated models of grammar
– Not the only way: e.g. Example-based MT (EBMT)
• Pioneered by IBM researchers (Brown et al., 1990)–Language Model
•Pr(S) estimated by n-grams
–Translation Model•Pr(T|S) estimated through alignment models
25
Statistical Machine Translation
• Parameter estimation by crunching large-scale corpora
• Hansard French/English parallel corpus– The Hansard Corpus consists of parallel texts in English and Canadian French,
drawn from official records of the proceedings of the Canadian Parliament. While the content is therefore limited to legislative discourse, it spans a broad assortment of topics and the stylistic range includes spontaneous discussion and written correspondance along with legislative propositions and prepared speeches.
• (IBM’s experiment: 100 million words, est. 17 million parameters)
26
The State of the Art
www.languageweaver.comStatistical MT System [Spinoff from USC/ISI work]• “Language Weaver’s SMTS system is a significant
advancement in the state of the art for machine translation… and [we] are confident that Language Weaver has produced the most commercially viable Arabic translation system available today.”
• Metrics: performance determined by competition– common test and training data
1980sJapanese
1970s
1960sRussianW. European languages
present dayArabic
27
Real Progress or Not?
• (2003) MT Summit IX. – Proceedings available online
• http://www.amtaweb.org/summit/MTSummit/• Interesting paper by J. Hutchins: Has machine translation improved? Some
historical comparisons.
“… overall there has been definite progress since the mid 1960s and very probably since the early 1970s. What is more uncertain is whether and where there have been improvements since the early 1980s.”
– Compared modern day systems against systems from the 1960s, 1970s (e.g. SYSTRAN) and 1980s
•Difficult: first systems are lost to us•Languages
–Russian to English–French to English–German to English
28
Real Progress or Not?
• http://babelfish.altavista.com/
29
Real Progress or Not?
[Hutchins, pp.7-8]• “The impediments to the improvement of translation quality
are the same now that they have been from the beginning:–failures of disambiguation–incorrect selection of target language words–problems with anaphora
•pronouns (it vs. she/he)•definite articles (e.g. when translating from Russian and French)
–inappropriate retention of source language structures •e.g. verb-initial constructions (from Russian) •verb-final placements (from German)•non-English pre-nominal participle constructions (e.g. with interest to be read materials from both Russian and German)
–problems of coordination–numerous and varied difficulties with prepositions–in general always problems with any multi-clause sentence”
Roughly echoes what Bar-Hillel said about 50 years earlier
30
Statistical vs. Traditional
• Which ones are commercially deployed?
– internet translators: traditional– new languages: statistical
31
Translating is EU's new boom industry
2004article
32
Translating is EU's new boom industry
33
Translating is EU's new boom industry
market is there:opportunitiesfor machinetranslation?