Upload
august-burke
View
221
Download
5
Tags:
Embed Size (px)
Citation preview
Human Judgements in Parallel Treebank Alignment
Martin Volk, Torsten Marek, Yvonne SamuelssonUniversity of Zurich and Stockholm [email protected]
223 August 2008
English Syntax Tree
323 August 2008
423 August 2008
523 August 2008
DE – EN
Alignment
623 August 2008
SMULTRON Stockholm MULtilingual TReebank 1000 sentences in 3 languages (DE-EN-SV)
500 from Jostein Gaarder’s Sophie’s World (~ 7 500 tokens, 14 tokens/sentence) and
500 from Economy texts (~ 11 000 tokens, 22 tokens/sentence)
ABB Quarterly report Rainforest Alliance: Banana Certification Program SEB Annual report
Released: January 2008 www.ling.su.se/dali/research/smultron/index.htm
723 August 2008
German Annotation
823 August 2008
German sentence: flat annotation
923 August 2008
German sentence: deepened
1023 August 2008
English Annotation
1123 August 2008
English Syntax Tree
1223 August 2008
English annotation
Follows the Penn Treebank guidelines Slower annotation because of
insertion of tracessecondary edgesdeeper trees
1323 August 2008
1423 August 2008
Tree Alignment
1523 August 2008
Sentence alignment Word alignment
input for Statistical MT Phrase alignment
linguistically motivated phrases input for Example-based MT
1623 August 2008
Alignment Example
1723 August 2008
Tools for Parallel Treebanks
creating and editing trees from mono-lingual treebanks PoS-taggers, chunkers, editor, ’tree-enricher’
aligning phrases use of word alignment tools tree alignment editor Stockholm TreeAligner
searching across languages TIGER-Search for parallel treebanks Stockholm
TreeAligner
1823 August 2008
Guidelines for Alignment
1. Align words and phrases that represent the same meaning and could serve as translation units in an MT system.
2. Align as many words and phrases as possible.3. Distinguish between exact and approximate
alignments.4. 1:n word / phrase alignments are allowed, but
not m:n word / phrase alignments.5. m:n sentence alignments are allowed.
1923 August 2008
Examples
Do not align:die Verwunderung über das Leben their astonishment at the world
Do align:was für eine seltsame Weltwhat an extraordinary world
2023 August 2008
Specific rules
a pronoun in one language shall never be aligned with a full noun in the other
names are aligned regardless of spelling, unless the name is changed (fiction)
ignore number/case but not voice
2123 August 2008
Exact vs approximate alignment
best vs. ”second-best” translation an acronym in one language shall be
aligned as approximate (fuzzy) with a spelled-out term in the otherPT – Power Technologies
difficult distinctionseiner der ersten Tage im Mai – early May
2223 August 2008
Related Research
Blinker project (Melamed) Prague Czech-English Treebank Example-based MT in Dublin Linköping English-Swedish Treebank
2323 August 2008
Experiment
12 students to align 20 tree pairs DE-EN10 tree pairs from Sophie’s world10 tree pairs from Economy text
advanced CL students received
short introduction the written guidelines
2423 August 2008
Gold Standard Alignment (DE-EN)
word - word phrase - phrase
exact approx. exact approx.
10 sent.
Sophie75 3 46 12
78 58
10 sent.
Econ159 19 62 9
178 71
2523 August 2008
Experiment: Results
The students created a huge variety in number of alignments Sophie part: from 47 to 125 (ø = 94.3) Econ part: from 62 to 259 (ø = 186.9) the 3 students with the lowest numbers
were non-native speakers of German 1 student had misunderstood the task
2623 August 2008
Experiment: Results
The remaining 8 students had a high overlap with the gold standard (Recall): Sophie part: from 48% to 81% (ø = 68.7%) Econ part: from 66% to 89% (ø = 75.5%)
Precision Sophie part: from 81% to 97% (ø = 89.1%) Econ part: from 78% to 94% (ø = 88.2%)
2723 August 2008
Discrepancies
students sometimes aligned a word (or some words) with a node.e.g. the word natürlich to the phrase of course
students sometimes aligned a German verb group with a single verb form in English e.g. ist zurückzuführen vs. reflecting
2823 August 2008
Discrepancies
based on different grammatical forms: a definite single NP in German with an
indefinite plural NP in Englishder Umsatz vs. revenues
a German genitive NP with a PP in English der beiden Divisionen vs. of the two divisions
2923 August 2008
Missed by all students
alignment of German word to empty token in Englishwenn sie die Hand ausstreckte vs. herself shaking hands
3023 August 2008
3123 August 2008
Conclusions
1. Our alignment guidelines are sufficient for a core of clear alignment decisions.
2. Needed:1. Better alignment rules with concrete
examples.2. Better support tools (consistency checking).
3. The distinction between exact alignment and approximate alignment is very tricky.
3223 August 2008
Thank You for Your Attention!
Questions???
3323 August 2008
Applications of Parallel Treebanks
For the Translator1. corpus for translation studies
search tools needed
For the Computational Linguist1. input for Example-based Machine
Translation2. evaluation corpus for word, phrase
or clause alignment3. training corpus for transfer rules
3423 August 2008
Alignment Example
3523 August 2008
Parallel Treebanking
DE sentence SV sentence
flat DE tree
ANNOTATE- PoS tagger (STTS)- Chunker (TIGER)
flat SV tree
PoS tagger (SUC)STTS conversionANNOTATE- Chunker (SWE-TIGER)
DE tree SV tree
Deepening Deepening + Back conv.
phrase alignment