View
211
Download
0
Category
Tags:
Preview:
DESCRIPTION
Material presented at the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India. Paper download at http://hal.archives-ouvertes.fr/hal-00743807. Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts.
Citation preview
Extraction of domain-specific bilingual lexiconfrom comparable corpora
compositional translation and ranking
Estelle Delpech1, Beatrice Daille1, Emmanuel Morin1, ClaireLemaire2,3
1LINA, Universite de Nantes 2GREMUTS, Universite de Grenoble3Lingua et Machina
COLING’12 10/12/12 Mumbai, India
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
I 51% to 88% precision on top 20 candidates with specializedcorpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
I 51% to 88% precision on top 20 candidates with specializedcorpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]I More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]I More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]I Outperforms context-based approaches for the translation of
terms with compositional meaning [Morin and Daille, 2009]
2 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}Reorder {αβ, βα}
Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}
Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphens
I match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Recreate target words by generating all possibleconcatenations of the components :
{toxique, cellule} → {toxique cellule},{toxiquecellule}
13 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Recreate target words by generating all possibleconcatenations of the components :
{toxique, cellule} → {toxique cellule},{toxiquecellule}
13 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
ContextTranslation method
Ranking methodResults of experiments
Future work
Target term frequency
Number of occurrences of target term divided by the totalnumber of occurrences in the target texts
Freq(t) =occ(t)
N
16 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Target term frequency
Number of occurrences of target term divided by the totalnumber of occurrences in the target texts
Freq(t) =occ(t)
N
16 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonyms
I score = mean of the reliability of the resources used fortranslating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonyms
I score = mean of the reliability of the resources used fortranslating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonymsI score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonymsI score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Combination
Linear combination of the 4 criterion Frequency, Context,Part-of-speech translation probability and Resources reliabilily
Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t)
20 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Combination
Linear combination of the 4 criterion Frequency, Context,Part-of-speech translation probability and Resources reliabilily
Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t)
20 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]
I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]
I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Results for translation generation
EN → FR EN → DE
# source terms 126 90
# at least 1 translation 86 (68%) 56 (62%)
# at least 1 translation 86 56
1 trans. in UMLS 68 (79%) 40 (71%)
1 trans. in UMLS or judged correct 81 (94%) 51 (91%)
26 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Results for translation ranking
EN → FR EN → DE Average
Random .83 .80 .815
Freq .92 .84 .88
Cont .90 .82 .86
Pos .88 .91 .895
Reso .92 .82 .87
Combination .93 .89 .91
ML AdaRank .90 .84 .87
ML CoordAsc .93 .89 .91ML LambdaMart .86 .88 .87
Table: Top1 translation in UMLS or judged correct
27 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Impact of fertile translations
EN → FR EN → DE
exact translations 21% 10%
wrong translations 50% 80%
Table: % of fertile translations
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
30 / 31
ContextTranslation method
Ranking methodResults of experiments
Future work
Impact of fertile translations
EN → FR EN → DE
exact translations 21% 10%
wrong translations 50% 80%
Table: % of fertile translations
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
30 / 31
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
ContextTranslation method
Ranking methodResults of experiments
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns on top of permutations
Try learning morpheme translation equivalences fromI cognatesI bilingual dictionariesI out-of-domain parallel data
31 / 31
Thank you for your attention.
Bestelle.delpech@univ-nantes.frbeatrice.daille@univ-nantes.fr
emmanuel.morin@univ-nantes.frcl@lingua-et-machina.com
ADDITIONAL SLIDES
Exact translations
Non fertiles:I pathophysiological → physiopathologiqueI overactive → uberaktiv
Fertiles:I cardiotoxicity → toxicite cardiaque ’cardiac toxicity’I mastectomy → ablation der brust ’ablation of the breast’
Morphological variants
Non fertiles:I dosimetry → dosimetrique ’dosimetric’I radiosensitivity → strahlenempfindlich ’radiosensitive’
Fertiles:I milk-producing → production de lait ’production of milk’I selfexamination → selbst untersuchen ’self examine’
Inexact but semantically related
Non fertiles:I oncogene → oncogenese ’oncogenesis’I breakthrough → durchbrechen ’break’
Fertiles:I chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’I treatable → pouvoir le traiter ’can treat it’
Wrong translations
Non fertiles:I immunoscore → immunomarquer ’immunostain’I check-in → unkontrollieren ’uncontrolled’
Fertiles:I bloodstream → fliessen mehr blut ’more blood flow’I risk-reducing → risque de reduire ’risk of reducing’
References I
Baldwin, T. and Tanaka, T. (2004).
Translation by machine of complex nominals.In Proceedings of the ACL 2004 Workshop on Multiword expressions: Integrating Processing, pages 24–31,Barcelona, Spain.
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.In 23eme International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Daille, B. and Morin, E. (2005).
French-English terminology extraction from comparable corpora.In Proceedings, 2nd International Joint Conference on Natural Language Processing, volume 3651 ofLecture Notes in Computer Sciences, page 707–718, Jeju Island, Korea. Springer.
Delpech, E. (2011).
Evaluation of terminologies acquired from comparable corpora : an application perspective.In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), volume 11of NEALT Proceedings Series,, pages 66–73, Riga, Latvia. Pedersen B.S., Nespore G., Skadina I.
Fung, P. (1997).
Finding terminology translations from non-parallel corpora.pages 192–202, Hong Kong.
Garera, N. and Yarowsky, D. (2008).
Translating compounds by learning component gloss translation via multiple languages.In Proceedings of the 3rd International Joint Conference on Natural Language Processing, volume 1, pages403–410, Hyderabad, India.
References II
Grefenstette, G. (1999).
The world wide web as a resource for example-based machine translation tasks.ASLIB’99 Translating and the computer, 21.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent TextProcessing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.D. Reidel, Dordrecht, Holland.
Lardrilleux, A. (2008).
A truly multilingual, high coverage, accurate, yet simple, sub-sentential alignment method.
Li, H. and Xu, J. (2007).
Adarank: A boosing algorithm for information retrieval.In Proceedings of the 30th annual international ACM SIGIR conference on Research and development ininformation retrieval, pages 391–398, Amsterdam, The Netherlands.
Metzler, D. and Croft, W. B. (2000).
Linear feature-based models for information retrieval.Information Retrieval, 10(3):257–274.
References III
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plainsailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moiron, springer netherlandsedition.
Morin, E. and Daille, B. (2010).
Compositionality and lexical alignment of multi-word terms.In Rayson, P., Piao, S., Sharoff, S., Evert, S., and B., V. M., editors, Language Resources and Evaluation(LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. Springer Netherlands.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.International Journal of Medical Informatics, 76(2-3):226–33.
Porter, M. F. (1980).
An algorithm for suffix stripping.Program, 14(3):130–137.
Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., and Utsuro, S. (2006).
Compiling French-Japanese terminologies from the web.In Proceedings of the 11th Conference of the European Chapter of the Association for ComputationalLinguistics, pages 225–232, Trento, Italy.
Tiedemann, J. (2009).
News from opus - a collection of multilingual parallel corpora with tools and interfaces.
Wu, Q., Burges, J. C., Svore, K., and Gao, J. (2010).
Adapting boosting for information retrieval measures.Journal of Information Retrieval, 13(3):254–270.
Recommended