Upload
flora-greene
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Improving Translation Selection using Conceptual Vectors
LIM Lian TzeComputer Aided Translation Unit
School of Computer Sciences
Universiti Sains Malaysia
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
Natural Language is Ambiguous
bankbank
?? ??
Word Sense Disambiguation
Given: a list of meanings/senses of
words (dictionaries) input text containing
occurrences of ambiguous words
Assign the correct sense to particular instance of ambiguous word in context
A.k.a. “sense-tagging”
….bank#1: a financial institution that accepts deposits and channels the money into lending activities
bank#2: sloping land (especially the slope beside a body of water)
….
…withdraw money from the bank...
bank#1
Disambiguation in Machine Translation (1)
….bank#1: a financial institution that accepts deposits and
channels the money into lending activities
bank#2: sloping land (especially the slope beside a bodyof water)….
…withdraw money from the bank...
(Malay translations)
bank
tebing
…withdraw money from the bank#1...
…mengeluarkan wang dari bank...
English input
Malay output
sense-tag(WSD)
select translation wordThat worked
well…
Disambiguation in Machine Translation (2)
….circulation#6: the spread or transmission of something
(as news or money) to a wider group or area ….
(Malay translations)
edaran (money)
penyebaran (berita)
…50 ringgit notes in circulation...
… 50 ringgit notes in circulation#6...
…duit kertas 50 ringgit dalam edaran?? penyebaran?...
English input
Malay output
sense-tag(WSD)
translate
That DIDN’T work well…
Optimising WSD for MT
Input word Sense number Translation word
select select
select
(Lee and Kim 2002)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
Main Objective
Existing MT system: Selects fragments (translation units) from previously
translated examples Re-combines selected translation units to produce
translation output for new input text
Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes
.
Need semantic knowledge about…
Word senses Use dictionary definitions
Pairs of translation words From bilingual knowledge bank (BKB) made up of pairs of sentences
that are translations of each other Corresponding words in each translation sentence pair are explicitly
marked
Need a model to capture semantic knowledge of lexical items Conceptual Vectors (Lafourcade 2001) Using a selection of concepts or themes Construct mathematical vectors from concepts Thematic similarity between lexical items ≡ angle between CVs
Need to:
Compile CVs for word meanings on 2 levels: Word sense (from dictionary) Word/phrase translation unit (from BKB) using data
compiled from previous step
Use compiled information during translation runtime to select correct translation units
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages and Contributions
Brief OutlineDictionary /
Lexicon
Word senses
word → sense numberlevel knowledge
Concept Category Labels
BKB
ExamplesTranslation
units
tag
Translation Unit Profile(word → translation level
knowledge)
Input Text
“clues”
matching, comparison, selection
selected translation units
Translated Text
Data Preparation Phase EBMT Run-time Phase
During TranslationDictionary /
Lexicon
Word senses
word → sense numberlevel knowledge
Concept Category Labels
BKB
ExamplesTranslation
units
tag
Translation Unit Profile(word → translation level
knowledge)
Input Text
“clues”
matching, comparison, selection
selected translation units
Translated Text
Data Preparation Phase EBMT Run-time Phase
Some Results
Translating ‘circulation’ to Malay edaran or penyebaran
TS: proposed translation selection using CVs BS: baseline strategy, chooses
the translation that co-occur with the same input words (and same structure) as in the BKB
or the most frequently occuring translation
InputTranslation chosen
by TSTranslation chosen
by BS
We will stop the circulation of that magazine. edaran penyebaran
We will stop the circulation of that rumour. penyebaran penyebaran
We will stop the circulation of that newspaper. edaran penyebaran
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
Advantages and Weaknesses
Pros: optimized for EBMT
focus on translation selection, bypass intermediate WSD at run time Handles many-to-many mapping of source word sense translation
words allows for bi-directional translation with sense-tagging for 1 language mathematical operations on vectors are easy to implement avoids combinatorial effect when multiple ambiguous words in input
Cons: not all ambiguities can be solved using co-occurring concepts does not handle translation selection of function words manual work required in data preparation
Research Contributions
Adaptation of a WSD approach for the specific aim of translation selection
Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries
Production of knowledge about word meanings on two levels: Word senses as in dictionaries Translations as in parallel text
Summary
WSD can be customized for different NLP applications accordingly Different requirements Increase efficiency
WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model Requires a concept category hierarchy and word sense list Concepts related to a word sense modelled as mathematical vector Conceptual similarity = angular distance between vectors
Future work Automating data preparation tasks Investigating suitable weights or normalizing factors during CV manipulation Integration with other WSD or translation selection strategies
Future Work
Automate tagging tasks that are currently done manually
Investigate different weight values for CVs for different syntactic relations or word classes
Integrate with other WSD/translation selection tasks
Thank You