Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).
A Summary of MT Research Activities Technologies – Demands – Gaps – Roadmaps
Jan HajičCharles University in Prague, Faculty of Mathematics and Physics
Computer Science School, Institute of Formal and Applied LinguisticsCzech [email protected]
META-FORUM 2015: Technologies for the Multilingual Digital Single Market
Riga, Latvia, April 27, 2015
Contributors
q QT21 project: Josef van Genabith, DFKI/Univ. of Saarland
q EU-BRIDGE: Alex Waibel, KIT/CMU
q HimL: Barry Haddow, Univ. of Edinburgh
q QTLeap: António Branco, Univ. of Lisbon
q MosesCore: Philipp Koehn, Univ. of Edinburgh/JHU
q TraMOOC: Valia Kordoni, Humboldt Univ. Berlin
q MMT: Alessandro Cattelan, Translated SRL
q CRACKER: Georg Rehm, DFKI Berlin
http://www.meta-net.eu 2
Current MT projects
http://www.meta-net.eu 3
MT Research: new approaches and methods
Deep MT for multiple languages Application in IT IR Moses development
coordination, shared tasks, MT Marathons
LT infrastructure – MT, spoken MT, ASR, LID, other services
MMT: open source architecture, big data, distributed SMT
MT for user-generated content
TraMOOC: translation of all MOOC elements incl. non-EU languages
MT in medical domain: public & professional documents
Coordination: Cracker LT Observatory
Demands
q Industry demands§ Happy, satisfied customer (cf. previous presentation!)§ Machine translation (and related language technologies):
- High (adequate) quality, easily adaptable incl. for user content, integration to CAT- Robust (speech environment, text type), fast (real-time)- Confidence - knows its limits- Technology simple to adopt, large data available, access to infrastructure
q Societal demands§ Easily accessible, ubiquitous, integrated in all services, usable in all situations
q Public services§ High quality dissemination MT, spoken translation (office use)§ Cross-border services (medical, legal, mobility, social services, governance)
http://www.meta-net.eu 4
Gaps in Technology
q Data§ Not enough data (small languages / language pairs), specialized data§ Access to data – legal issues, formats, quality of data, metadata
q Evaluation§ Too much BLEU: fair, adequate evaluation for development, industry§ Confidence learning and assessment, poor or none quality prediction
q Methods, algorithms, theory§ Speech: recognition still very brittle vs. environment
- dialog system modelling§ Linguistic facts and findings – syntax, semantics, discourse, ...: integration?§ World knowledge and context modelling
- user, situation, location, emotions, style, ...§ Learning: slow, offline, relies on annotated data, parallel data
http://www.meta-net.eu 5
Research Breakthroughs Needed
q Learning§ Deep learning, unsupervised learning, small data learning; limits?§ Context-aware learning, continuous learning, (very) noisy data learning§ Situational, user-based adaptation, error recovery
q Linguistics: what and how§ Morphology, syntax, semantics, discourse, grounding, entailment: limits?
q Speech§ Advances in acoustic modelling, context use, user modelling§ Integration with other technologies (incl. MT), conversational systems
q Data gathering§ Non-traditional data acquisition: comparable & monolingual corpora for MT§ Data identification, selection, cleaning§ Novel data acquisition: conversations, speech/text linked to real world
http://www.meta-net.eu 6
To Do – Roadmapping
q Goal: breakthroughs → technology → applications§ Europe takes a lead (who else?)
q Strategic Research and Innovation Agenda§ Version 0.5 available; three layers:
- Innovative technology solutions§ For business, public sector
- Language Technology Services, Platforms, Infrastructures§ To lower cost for everybody, esp. SMEs and companies entering the market
- Priority Research Themes§ Learning, learning, learning§ Sharing (universal) technology§ Responds to:
» Richness and diversity: many languages» Meaning and understanding: linguistics, semantics, logic» Multimodality and grounding: connecting language and the world
http://www.meta-net.eu 7