17
Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing Lecture #33: Intro. To Machine Translation This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .

Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Embed Size (px)

Citation preview

Page 1: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of thematerials used in this lecture.

CS 479, section 1:Natural Language Processing

Lecture #33: Intro. To Machine Translation

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Page 2: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Announcements

Reading Report #13: M&S ch. 13 on alignment and MT Due now; discussing at end of lecture today or on the

group

Homework 0.3 Feedback Question one did not contribute to your grade Compare with the key

Homework 0.4 Posted Tuesday

Page 3: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Final Project Project #4

Note the updates to the tutorial with the flowchart slides from lecture #29 Project #5

Instructions to be updated today Help session: Tuesday

Propose-your-own Move forward Feedback to be sent today

Project Report: Early: Wednesday after Thanksgiving Due: Friday after Thanksgiving

Check the schedule Plan enough time to succeed!

Page 4: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Quiz – keep the ideas fresh

1. What are the four steps of the Expectation Maximization (EM) algorithm? Think of the document clustering example, if that

helps

2. What is the primary purpose of EM?

Page 5: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Objectives

Introduce the problem of machine translation

Appreciate the need for alignment in statistical approaches to translation

Page 6: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Machine Translation is Hard

REF:According to the data provided today by the Ministry of ForeignTrade and Economic Cooperation, as of November this year, Chinahas actually utilized 46.959 billion US dollars of foreign capital, including40.007 billion US dollars of direct investment from foreign businessmen.

the Ministry of Foreign Trade and Economic Cooperation, including foreigndirect investment 40.007 billion US dollars today provide data includethat year to November china actually using foreign 46.959 billion US dollars and

today’s available data of the Ministry of Foreign Trade and Economic Cooperationshows that china’s actual utilization of November this year will include 40.007billion US dollars for the foreign direct investment among 46.959 billion US dollarsin foreign capital

IBM4:

Yamada & Knight:

Page 7: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

But MT is Real

http://www.microsofttranslator.com/

http://translate.google.com/

Page 8: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Why so hard?

What makes translation so hard?

Page 9: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

History

1950’s: Intensive research activity in MT Roll video …

Page 10: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing
Page 11: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

History

1950’s: Intensive research activity in MT Roll video …

1960’s: Direct word-for-word replacement 1966 (ALPAC): NRC Report on MT

Conclusion: MT no longer worthy of serious scientific investigation.

1966-1975: “Recovery period” 1975-1985: Resurgence (Europe, Japan) 1985-present: Gradual Resurgence (US)

Page 12: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

How?

How would you implement automatic translation on a computer?

Page 13: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Big Idea: Word Alignment

Start with parallel corpora

Learn word alignment Hidden variable: alignment from foreign (target)

word to source word. Use EM!

Page 14: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Vauquois Triangle

Interlingua

SemanticStructure

SemanticStructure

SyntacticStructure

SyntacticStructure

WordStructure

WordStructure

Source Text Target Text

SemanticComposition

SemanticDecomposition

SemanticAnalysis

SemanticGeneration

SyntacticAnalysis

SyntacticGeneration

MorphologicalAnalysis

MorphologicalGeneration

SemanticTransfer

SyntacticTransfer

Direct

Page 15: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Methods Rule-based Methods

Expert system-like rewrite systems Lexicons constructed by people Can be very fast, and can accumulate a lot of knowledge

over time e.g., SysTran – the engine behind the venerable Babelfish

Statistical Methods Word-to-word translation Phrase-based translation Syntax-based translation (tree-to-tree, tree-to-string, etc.) Trained on parallel corpora Usually noisy-channel (at least in spirit), but increasingly

direct

Page 16: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

Your Questions

Take the discussion online

Page 17: Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing

To be continued …