22
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Embed Size (px)

Citation preview

Page 1: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Improving Translation Selection using Conceptual Vectors

LIM Lian TzeComputer Aided Translation Unit

School of Computer Sciences

Universiti Sains Malaysia

Page 2: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 3: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 4: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Natural Language is Ambiguous

bankbank

?? ??

Page 5: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Word Sense Disambiguation

Given: a list of meanings/senses of

words (dictionaries) input text containing

occurrences of ambiguous words

Assign the correct sense to particular instance of ambiguous word in context

A.k.a. “sense-tagging”

….bank#1: a financial institution that accepts deposits and channels the money into lending activities

bank#2: sloping land (especially the slope beside a body of water)

….

…withdraw money from the bank...

bank#1

Page 6: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Disambiguation in Machine Translation (1)

….bank#1: a financial institution that accepts deposits and

channels the money into lending activities

bank#2: sloping land (especially the slope beside a bodyof water)….

…withdraw money from the bank...

(Malay translations)

bank

tebing

…withdraw money from the bank#1...

…mengeluarkan wang dari bank...

English input

Malay output

sense-tag(WSD)

select translation wordThat worked

well…

Page 7: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Disambiguation in Machine Translation (2)

….circulation#6: the spread or transmission of something

(as news or money) to a wider group or area ….

(Malay translations)

edaran (money)

penyebaran (berita)

…50 ringgit notes in circulation...

… 50 ringgit notes in circulation#6...

…duit kertas 50 ringgit dalam edaran?? penyebaran?...

English input

Malay output

sense-tag(WSD)

translate

That DIDN’T work well…

Page 8: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Optimising WSD for MT

Input word Sense number Translation word

select select

select

(Lee and Kim 2002)

Page 9: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 10: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Main Objective

Existing MT system: Selects fragments (translation units) from previously

translated examples Re-combines selected translation units to produce

translation output for new input text

Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes

.

Page 11: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Need semantic knowledge about…

Word senses Use dictionary definitions

Pairs of translation words From bilingual knowledge bank (BKB) made up of pairs of sentences

that are translations of each other Corresponding words in each translation sentence pair are explicitly

marked

Need a model to capture semantic knowledge of lexical items Conceptual Vectors (Lafourcade 2001) Using a selection of concepts or themes Construct mathematical vectors from concepts Thematic similarity between lexical items ≡ angle between CVs

Page 12: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Need to:

Compile CVs for word meanings on 2 levels: Word sense (from dictionary) Word/phrase translation unit (from BKB) using data

compiled from previous step

Use compiled information during translation runtime to select correct translation units

Page 13: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages and Contributions

Page 14: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Brief OutlineDictionary /

Lexicon

Word senses

word → sense numberlevel knowledge

Concept Category Labels

BKB

ExamplesTranslation

units

tag

Translation Unit Profile(word → translation level

knowledge)

Input Text

“clues”

matching, comparison, selection

selected translation units

Translated Text

Data Preparation Phase EBMT Run-time Phase

Page 15: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

During TranslationDictionary /

Lexicon

Word senses

word → sense numberlevel knowledge

Concept Category Labels

BKB

ExamplesTranslation

units

tag

Translation Unit Profile(word → translation level

knowledge)

Input Text

“clues”

matching, comparison, selection

selected translation units

Translated Text

Data Preparation Phase EBMT Run-time Phase

Page 16: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Some Results

Translating ‘circulation’ to Malay edaran or penyebaran

TS: proposed translation selection using CVs BS: baseline strategy, chooses

the translation that co-occur with the same input words (and same structure) as in the BKB

or the most frequently occuring translation

InputTranslation chosen

by TSTranslation chosen

by BS

We will stop the circulation of that magazine. edaran penyebaran

We will stop the circulation of that rumour. penyebaran penyebaran

We will stop the circulation of that newspaper. edaran penyebaran

Page 17: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 18: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Advantages and Weaknesses

Pros: optimized for EBMT

focus on translation selection, bypass intermediate WSD at run time Handles many-to-many mapping of source word sense translation

words allows for bi-directional translation with sense-tagging for 1 language mathematical operations on vectors are easy to implement avoids combinatorial effect when multiple ambiguous words in input

Cons: not all ambiguities can be solved using co-occurring concepts does not handle translation selection of function words manual work required in data preparation

Page 19: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Research Contributions

Adaptation of a WSD approach for the specific aim of translation selection

Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries

Production of knowledge about word meanings on two levels: Word senses as in dictionaries Translations as in parallel text

Page 20: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Summary

WSD can be customized for different NLP applications accordingly Different requirements Increase efficiency

WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model Requires a concept category hierarchy and word sense list Concepts related to a word sense modelled as mathematical vector Conceptual similarity = angular distance between vectors

Future work Automating data preparation tasks Investigating suitable weights or normalizing factors during CV manipulation Integration with other WSD or translation selection strategies

Page 21: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Future Work

Automate tagging tasks that are currently done manually

Investigate different weight values for CVs for different syntactic relations or word classes

Integrate with other WSD/translation selection tasks

Page 22: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Thank You