28
Introduction to Computational Linguistics Frank Richter [email protected]. Seminar f ¨ ur Sprachwissenschaft Eberhard-Karls-Universit ¨ at T ¨ ubingen Germany Intro to CL – WS 2006/7 – p.1

Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Introduction to ComputationalLinguistics

Frank Richter

[email protected].

Seminar fur Sprachwissenschaft

Eberhard-Karls-Universit at Tubingen

Germany

Intro to CL – WS 2006/7 – p.1

Page 2: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Central Goal of the Field

build psychologically adequate models of humanlanguage processing capabilities on the basis ofknowledge about the way in which humans acquire,store, and process language.

build functionally correct models of human languageprocessing capabilities on the basis of knowledge aboutthe world and about language elicited from people andstored in the system.

Intro to CL – WS 2006/7 – p.2

Page 3: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Application Areas

machine translation

speech recognition

speech synthesis

man-machine interfaces

Intro to CL – WS 2006/7 – p.3

Page 4: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Application Areas

intelligent word processing: spelling correction,grammar correction

document managementfind relevant documents in collectionsestablish authorship of documentscatch plagiarismextract information from documentsclassify documentssummarize documentssummarize document collections

Intro to CL – WS 2006/7 – p.4

Page 5: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

A bit of Philosophy of Science

Theory:A set of statements that determine the format andsemantics of descriptions of phenomena in the purviewof the theory

Methodology:An effective theory comes with an explicit methodologyfor acquiring these descriptions

Application:A theory associated with a methodology can be appliedto tasks for which the methodology is appropriate.

Intro to CL – WS 2006/7 – p.5

Page 6: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Scientific Strategies

Method Oriented Approach:devise or import a tool, a procedure or a formalism,apply it to a task and develop it further. Then(optionally) see whether it works for additional tasks

Task oriented Approach:select a task; devise or import a method or severalmethods for its solution; integrate the methods asrequired to improve performance.

Intro to CL – WS 2006/7 – p.6

Page 7: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

Intro to CL – WS 2006/7 – p.7

Page 8: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

requires all steps relevant to linguistic analysis of inputsentences and linguistic generation of output sentences

Intro to CL – WS 2006/7 – p.7

Page 9: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

requires all steps relevant to linguistic analysis of inputsentences and linguistic generation of output sentences

hence, machine translation is scientifically one of themost challenging and most comprehensive tasks incomputational linguistics

Intro to CL – WS 2006/7 – p.7

Page 10: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Intro to CL – WS 2006/7 – p.8

Page 11: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Information Dissemination:e.g. Translation of technical manuals, legal texts,weather reports, etc.

Intro to CL – WS 2006/7 – p.8

Page 12: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Information Dissemination:e.g. Translation of technical manuals, legal texts,weather reports, etc.

Literary Translation:e.g. Translation of novels, poems, etc.

Intro to CL – WS 2006/7 – p.8

Page 13: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguage

Intro to CL – WS 2006/7 – p.9

Page 14: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source language

Intro to CL – WS 2006/7 – p.9

Page 15: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguage

Intro to CL – WS 2006/7 – p.9

Page 16: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexica

Intro to CL – WS 2006/7 – p.9

Page 17: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexicalow-quality translation is tolerable

Intro to CL – WS 2006/7 – p.9

Page 18: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguage

Intro to CL – WS 2006/7 – p.10

Page 19: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguage

Intro to CL – WS 2006/7 – p.10

Page 20: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reports

Intro to CL – WS 2006/7 – p.10

Page 21: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manuals

Intro to CL – WS 2006/7 – p.10

Page 22: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manualspurely human translation for such tasks can betime-consuming, inconsistent, or tedious.

Intro to CL – WS 2006/7 – p.10

Page 23: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic language

Intro to CL – WS 2006/7 – p.11

Page 24: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic languageabundance of highly-trained human translators

Intro to CL – WS 2006/7 – p.11

Page 25: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic languageabundance of highly-trained human translatorstask rarely performed by machine translation

Intro to CL – WS 2006/7 – p.11

Page 26: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

What Makes Machine Translation Hard

Lexical Ambiguity

Intro to CL – WS 2006/7 – p.12

Page 27: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

What Makes Machine Translation Hard

Lexical Ambiguity

Lexical Gaps

Intro to CL – WS 2006/7 – p.12

Page 28: Introduction to Computational Linguistics · requires all steps relevant to linguistic analysis of input sentences and linguistic generation of output sentences hence, machine translation

What Makes Machine Translation Hard

Lexical Ambiguity

Lexical Gaps

Syntactic Divergences between Source and TargetLanguage

Intro to CL – WS 2006/7 – p.12