Upload
ralph-owen
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of thematerials used in this lecture.
CS 479, section 1:Natural Language Processing
Lecture #33: Intro. To Machine Translation
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Announcements
Reading Report #13: M&S ch. 13 on alignment and MT Due now; discussing at end of lecture today or on the
group
Homework 0.3 Feedback Question one did not contribute to your grade Compare with the key
Homework 0.4 Posted Tuesday
Final Project Project #4
Note the updates to the tutorial with the flowchart slides from lecture #29 Project #5
Instructions to be updated today Help session: Tuesday
Propose-your-own Move forward Feedback to be sent today
Project Report: Early: Wednesday after Thanksgiving Due: Friday after Thanksgiving
Check the schedule Plan enough time to succeed!
Quiz – keep the ideas fresh
1. What are the four steps of the Expectation Maximization (EM) algorithm? Think of the document clustering example, if that
helps
2. What is the primary purpose of EM?
Objectives
Introduce the problem of machine translation
Appreciate the need for alignment in statistical approaches to translation
Machine Translation is Hard
REF:According to the data provided today by the Ministry of ForeignTrade and Economic Cooperation, as of November this year, Chinahas actually utilized 46.959 billion US dollars of foreign capital, including40.007 billion US dollars of direct investment from foreign businessmen.
the Ministry of Foreign Trade and Economic Cooperation, including foreigndirect investment 40.007 billion US dollars today provide data includethat year to November china actually using foreign 46.959 billion US dollars and
today’s available data of the Ministry of Foreign Trade and Economic Cooperationshows that china’s actual utilization of November this year will include 40.007billion US dollars for the foreign direct investment among 46.959 billion US dollarsin foreign capital
IBM4:
Yamada & Knight:
But MT is Real
http://www.microsofttranslator.com/
http://translate.google.com/
Why so hard?
What makes translation so hard?
History
1950’s: Intensive research activity in MT Roll video …
History
1950’s: Intensive research activity in MT Roll video …
1960’s: Direct word-for-word replacement 1966 (ALPAC): NRC Report on MT
Conclusion: MT no longer worthy of serious scientific investigation.
1966-1975: “Recovery period” 1975-1985: Resurgence (Europe, Japan) 1985-present: Gradual Resurgence (US)
How?
How would you implement automatic translation on a computer?
Big Idea: Word Alignment
Start with parallel corpora
Learn word alignment Hidden variable: alignment from foreign (target)
word to source word. Use EM!
Vauquois Triangle
Interlingua
SemanticStructure
SemanticStructure
SyntacticStructure
SyntacticStructure
WordStructure
WordStructure
Source Text Target Text
SemanticComposition
SemanticDecomposition
SemanticAnalysis
SemanticGeneration
SyntacticAnalysis
SyntacticGeneration
MorphologicalAnalysis
MorphologicalGeneration
SemanticTransfer
SyntacticTransfer
Direct
Methods Rule-based Methods
Expert system-like rewrite systems Lexicons constructed by people Can be very fast, and can accumulate a lot of knowledge
over time e.g., SysTran – the engine behind the venerable Babelfish
Statistical Methods Word-to-word translation Phrase-based translation Syntax-based translation (tree-to-tree, tree-to-string, etc.) Trained on parallel corpora Usually noisy-channel (at least in spirit), but increasingly
direct
Your Questions
Take the discussion online
To be continued …