DeepMiner - Advanced Leveraging : Integrating Translation Memories and Machine Translation

  • Published on
    21-May-2015

  • View
    434

  • Download
    0

Embed Size (px)

DESCRIPTION

Presentation at TEKOM October 25th,2012 DeepMiner Integrating Translation Memories and Machine Translation

Transcript

<ul><li> 1. DeepMinerIntegrating Translation Memories andMachine TranslationTEKOMOctober 25th, 2012Presenter: Daniel Benito</li></ul> <p> 2. Introduction History Limitations of Translation Memory Beyond Segment-Level Reuse Machine Translation Fuzzy Match Repair Advanced Leveraging Combining TM and MT Current Limitations Perspectives Conclusion 3. History Past: 1950s Early Machine Translation (MT) experiments 1960s General awareness that Machine Translation (MT) was not going to replace human translators 1970s First proposals for Translator Workstations 1990s Translation Memory (TM) became viable Present: TM technology has barely advanced in the last ten years MT has advanced to the point where its applications in the translation industry are incontrovertible 4. Limitations of Translation Memory Segment-level translation reuse is only useful inlimited cases Even in highly repetitive texts, most of therepetitions happen at the sub-segment level: Terms and phrases Sentence structure Most Translation Memory systems are limited toproviding fuzzy matches but are unable to exploitsub-segment repetition 5. Beyond Segment-level Reuse We need to translate:EN: The black cat usually sleeps in the hallway. Our TM contains:EN: The grey cat usually sleeps in the living room.DE: Die graue Katze schlft gewhnlich im Wohnzimmer. What can we do to reduce the time spent editingfuzzy matches? Ignore the fuzzy matches and use MT Automatically repair the fuzzy matches 6. Machine Translation We need to translate:EN: The black cat usually sleeps in the hallway. Results returned by various MT systems:DE: Die schwarze Katze in der Regel schlft im Flur.DE: Die schwarze Katze schlft normalerweise im Flur. Achieving consistency and using specific terminology(e.g. Gang instead of Flur) will require some degreeof training or post-editing 7. Machine Translation General-purpose MT engines such as GoogleTranslate or Microsoft Translator usually requireextensive post-editing, but can be used forinspiration Rule-based and statistical MT engines customized forspecific domains offer much higher quality butrequire expensive tuning or retraining It is usually more expensive to use MT than tomanually edit a fuzzy match 8. Fuzzy Match Repair Inspired by the translation by analogy concept fromExample-Based Machine Translation (EBMT) Attempts to maintain the quality and consistency ofexisting translations in the TM while increasingproductivity 9. Fuzzy Match Repair We need to translate:EN: The black cat usually sleeps in the hallway. Our TM contains:EN: The grey cat usually sleeps in the living room.DE: Die graue Katze schlft gewhnlich im Wohnzimmer. We can replace graue with schwarze andWohnzimmer with Gang to produce an exact match. 10. Fuzzy Match Repair Requires knowing the following translations:grey graueblack schwarzeliving room Wohnzimmerhallway Gang What do we do if those translations are not explicitlyin our TMs or termbases? 11. Advanced Leveraging Bilingual concordance search: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schlft gewhnlich im Wohnzimmer. EN: Mary has bought a new pair of grey running shoes. DE: Maria hat ein neues Paar graue Laufschuhe gekauft. EN: This article is also available in grey. DE: Dieser Artikel ist auch in grau erhltlich. 12. Advanced Leveraging Statistically infer translations from the TM Compare all of the German translations and suggestone or more probable translations (e.g. graue, grau) Requires: Large TMs with many examples Consistent translations in the TM 13. Combining TM and MT We can use MT as an additional resource for findingthe translations needed to repair fuzzy matches MT systems often give better results for terms andshort phrases than for long sentences We approach this combination based on thefollowing premises: A clients own data is considered to be of higher quality and will always have priority over the Machine Translation results A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment 14. Combining TM and MT We need to translate:EN: The black cat usually sleeps in the hallway. Our TM contains:EN: The grey cat usually sleeps in the living room.DE: Die graue Katze schlft gewhnlich im Wohnzimmer. Our termbase contains:EN: greyDE: graueEN: blackDE: schwarzeEN: hallwayDE: Gang 15. Combining TM and MT We do not have the translation for living room in ourTM or our termbase, so we can request it from theMT system:EN: living roomDE: Wohnzimmer The combination of material in our TM, termbaseand MT system allows to perform the appropriatereplacements and obtain:EN: The black cat usually sleeps in the hallway.DE: Die schwarze Katze schlft gewhnlich im Gang. 16. Current Limitations We need to translate:EN: The white dog usually sleeps in the living room. Our TM contains:EN: The grey cat usually sleeps in the living room.DE: Die graue Katze schlft gewhnlich im Wohnzimmer. Our termbase contains:EN: grey catDE: graue Katze 17. Current Limitations Asking the MT system for the missing translation, weget:EN: white dogDE: weier Hund The result of fixing the fuzzy match is:EN: The white dog usually sleeps in the living room.DE: Die weier Hund schlft gewhnlich im Wohnzimmer. Some post-editing is still required 18. Current Limitations We need to translate:EN: The grey cat often sleeps in the living room. Our TM contains:EN: The grey cat usually sleeps in the living room.DE: Die graue Katze schlft gewhnlich im Wohnzimmer. The translations we get from the MT system are:EN: usuallyDE: normalerweiseEN: oftenDE: oft We cannot repair the fuzzy match because we do notknow how usually has been translated 19. Future Developments Greater integration with the MT engines Access to internal translation candidates: EN: usually DE: normalerweise, gewhnlich, sonst, ... Access to internal language models: DE: Die weier Hund never DE: Der weie Hund often Automatic upload of new TM material to the MT engine so it can be used for retraining in the future 20. Conclusion Traditional segment-level translation reuse hasreached its full potential ATRILs Dj Vu X2 already includes DeepMinertechnology that improves productivity by cleverlycombining all the approaches we described: (Statistical) Machine Translation Example-Based Machine Translation Advanced Leveraging (sub-segment matching) 21. Questions? 22. Additional Topics 23. Predictive Typing Find all sub-segment matches and offer them to thetranslator as he or she types Suggestions are context-sensitive, so there are nevertoo many results to choose from Translations are constructed piece by piece fromprevious texts, guided by the translator 24. Advanced Predictive Typing Advanced Leveraging techniques for statisticallyinferring sub-segment translations from the TM canbe adapted to provide additional predictive typingsuggestions Translations from MT can be added to the predictivetyping mechanism, to offer additional suggestions fortranslations of terms and phrases 25. MT integrations in Dj Vu X2 Systran Entreprise Server Google Translate Microsoft Translator PROMT Translation Server itranslate4eu 26. Systran Entreprise Server 27. Google Translate 28. Microsoft Translator 29. PROMT Translation Server 30. itranslate4eu</p>

Recommended

View more >