12
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Embed Size (px)

Citation preview

Page 1: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Machine Translation, Digital Libraries, and the Computing

Research Laboratory

Indo-US Workshop on Digital Libraries

June 23, 2003

Page 2: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

The Computing Research Laboratory (CRL)

New Mexico State University

Las Cruces, New Mexico

http://crl.nmsu.edu

Stephen Helmreich

(505) 646-2141

[email protected]

Page 3: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Machine Translation (MT)

• Component technologies

• Comparable technologies

• Composed technologies

Page 4: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

MT--Purposes

• Dissemination (high quality) sublanguages, controlled languages

• Assimilation (broad coverage)

• Communication (speed)

Page 5: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

MT -- Types

• Direct – string-for-string

• Transfer – structure-for-structure

• Interlingual – to and from a meaning representation

• Statistical – most probable translation given a corpus

Page 6: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Component technologies -- I

• Character encoding and representation, text editing (Unicode)

• Text segmenting (OCR, sandhi?)

• Morphological analysis

• Lexical annotation (part of speech tagging, proper name identification, others)

Page 7: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Component technologies -- II

• Syntactic analyzers (grammars, parsers)

• Bilingual/multilingual dictionaries

• Ontologies (WordNet, OntoSem, Cyc)(lexical, linguistic, world-knowledge)

• Generation systems

Page 8: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Comparable technologies

• Information Retrieval (IE) (URSA)

• Information Extraction (IR) (MUC)

• Text Summarization (DUC)

• Word Sense Disambiguation (SensEval)

• Cross-Document Named Entity Identification (Coreference Resolution)

Page 9: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Composed Technologies

• All of the above (IR/IE/Summarization)

• multi-lingual

• multi-modal

• with attention to human-computer interaction (HCI)

Page 10: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Composed technologies -- II

• Personal Profiler – searches the web to find information about a particular person, translates it if appropriate, and organizes in temporal order

• Quick Ramp-up MT (Expedition) – allows a non-linguist language user and a computer expert to construct a simple MT system

Page 11: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Question-Answering Systems

• Advanced Question and Answering for Intelligence (AQUAINT)

• MOQA – Meaning-Oriented Question Answering

• Allows user to pose structured or natural language queries, obtains answer from a variety of sources, and presents the answer appropriately

Page 12: Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003

Summary

• Choose an appropriate purpose and type

• Look at related technologies: component, comparable, composed

• Search for an appropriate research partner