8
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899). A Summary of MT Research Activities Technologies – Demands – Gaps – Roadmaps Jan Hajič Charles University in Prague, Faculty of Mathematics and Physics Computer Science School, Institute of Formal and Applied Linguistics Czech Republic [email protected] META-FORUM 2015: Technologies for the Multilingual Digital Single Market Riga, Latvia, April 27, 2015

A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).

A Summary of MT Research Activities Technologies – Demands – Gaps – Roadmaps

Jan HajičCharles University in Prague, Faculty of Mathematics and Physics

Computer Science School, Institute of Formal and Applied LinguisticsCzech [email protected]

META-FORUM 2015: Technologies for the Multilingual Digital Single Market

Riga, Latvia, April 27, 2015

Page 2: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Contributors

q  QT21 project: Josef van Genabith, DFKI/Univ. of Saarland

q  EU-BRIDGE: Alex Waibel, KIT/CMU

q  HimL: Barry Haddow, Univ. of Edinburgh

q  QTLeap: António Branco, Univ. of Lisbon

q  MosesCore: Philipp Koehn, Univ. of Edinburgh/JHU

q  TraMOOC: Valia Kordoni, Humboldt Univ. Berlin

q  MMT: Alessandro Cattelan, Translated SRL

q  CRACKER: Georg Rehm, DFKI Berlin

http://www.meta-net.eu 2

Page 3: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Current MT projects

http://www.meta-net.eu 3

MT Research: new approaches and methods

Deep MT for multiple languages Application in IT IR Moses development

coordination, shared tasks, MT Marathons

LT infrastructure – MT, spoken MT, ASR, LID, other services

MMT: open source architecture, big data, distributed SMT

MT for user-generated content

TraMOOC: translation of all MOOC elements incl. non-EU languages

MT in medical domain: public & professional documents

Coordination: Cracker LT Observatory

Page 4: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Demands

q  Industry demands§  Happy, satisfied customer (cf. previous presentation!)§  Machine translation (and related language technologies):

-  High (adequate) quality, easily adaptable incl. for user content, integration to CAT-  Robust (speech environment, text type), fast (real-time)-  Confidence - knows its limits-  Technology simple to adopt, large data available, access to infrastructure

q  Societal demands§  Easily accessible, ubiquitous, integrated in all services, usable in all situations

q  Public services§  High quality dissemination MT, spoken translation (office use)§  Cross-border services (medical, legal, mobility, social services, governance)

http://www.meta-net.eu 4

Page 5: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Gaps in Technology

q  Data§  Not enough data (small languages / language pairs), specialized data§  Access to data – legal issues, formats, quality of data, metadata

q  Evaluation§  Too much BLEU: fair, adequate evaluation for development, industry§  Confidence learning and assessment, poor or none quality prediction

q  Methods, algorithms, theory§  Speech: recognition still very brittle vs. environment

-  dialog system modelling§  Linguistic facts and findings – syntax, semantics, discourse, ...: integration?§  World knowledge and context modelling

-  user, situation, location, emotions, style, ...§  Learning: slow, offline, relies on annotated data, parallel data

http://www.meta-net.eu 5

Page 6: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Research Breakthroughs Needed

q  Learning§  Deep learning, unsupervised learning, small data learning; limits?§  Context-aware learning, continuous learning, (very) noisy data learning§  Situational, user-based adaptation, error recovery

q  Linguistics: what and how§  Morphology, syntax, semantics, discourse, grounding, entailment: limits?

q  Speech§  Advances in acoustic modelling, context use, user modelling§  Integration with other technologies (incl. MT), conversational systems

q  Data gathering§  Non-traditional data acquisition: comparable & monolingual corpora for MT§  Data identification, selection, cleaning§  Novel data acquisition: conversations, speech/text linked to real world

http://www.meta-net.eu 6

Page 7: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

To Do – Roadmapping

q  Goal: breakthroughs → technology → applications§  Europe takes a lead (who else?)

q  Strategic Research and Innovation Agenda§  Version 0.5 available; three layers:

-  Innovative technology solutions§  For business, public sector

-  Language Technology Services, Platforms, Infrastructures§  To lower cost for everybody, esp. SMEs and companies entering the market

-  Priority Research Themes§  Learning, learning, learning§  Sharing (universal) technology§  Responds to:

»  Richness and diversity: many languages»  Meaning and understanding: linguistics, semantics, logic»  Multimodality and grounding: connecting language and the world

http://www.meta-net.eu 7

Page 8: A Summary of MT Research Activities - META · " Not enough data (small languages / language pairs), specialized data " Access to data – legal issues, formats, quality of data, metadata!

Q/A

Thank you.

[email protected]

http://www.meta-net.euhttp://www.facebook.com/META.Alliance

8