Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Language Technology and Digital Public ServicesConnecting Europe Facility – Automated Translation
Philippe Gelin
DG ConnectDirectorate-General for Communications Networks, Content and Technology
The Connecting Europe Facility – funding programme 2014 -2020
TRANSPORT€26.25bn
ENERGY€5.85bn
TELECOM
Broadband€170 M
Digital Service
Infrastructures€970 M *
CEF RegulationThe Connecting Europe Facility (CEF) is a regulation that defines how the Commission can finance support for the establishment of trans-European networks to reinforce an interconnected Europe.
* - 100 M Juncker Package
CEF Telecom GuidelinesThe CEF Telecom guidelines cover the specific objectives and priorities as well as eligibility criteria for funding of broadband networks and Digital Service Infrastructures (DSIs).
CEF Work ProgrammesTranslates the CEF Telecom Guidelines in general objectives and actions planned on a yearly basis.
CEF Automated Translation helps European and national public administrations exchange information across language barriers in the EU, by providing machine translation capabilities that will enable all Digital Service Infrastructures to be multilingual.
eJustice Portal
Justice, home affairs and citizens' rights
ODR Open Data
Science and Technology Business
BRIS
CEF Digital at a glance
etc.
Funding at EU LEVEL (Commission)
Core Services
Employment and Social Rights
EESSI
Funding for theMEMBER STATES
Generic Services
IDENTIFYwith eID
SIGNwith eSignature
INVOICEwith eInvoicing
EXCHANGEwith eDelivery
TRANSLATEwith eTranslation
Typically 'deployment' projects at national level (up to 75% of eligible cost)
CEF eTranslationThe same automatic translation for two complementary purposes
4
CEF eTranslation• CEF Digital Service Infrastructures• Pan-European digital public services• MS public administrations
DGT eTranslation• Translators of the EU Institutions• Digital services of the EU Institutions
41 million pages in
2018
18 million pages in Q1
2019
CEF eTranslation - Capacity
Web Page use in 2019 so far• 51 million pages.
Current daily record• 3 Oct. – 989 000 pages.
12%
88%
Web Page Machine-to-machine
Language Resource Collection
6
0
200
400
600
800
1000
1200
March
201
6
April
201
6
May 2
016
June
20 16
July
201
6
Augu
st 2
016
Sept
embe
r 201
6
Octob
er 2
016
Novem
ber 2
016
Decem
ber 2
0 16
Janu
ary
2017
Febr
u ary
2017
March
201
7
April
201
7
May 2
017
June
20 17
July
201
7
Augu
st 2
017
Sept
embe
r 201
7
Octob
er 2
017
Novem
ber 2
017
Decem
ber 2
0 17
Janu
ary
2018
Febr
u ary
2018
March
201
8
April
201
8
May 2
018
June
20 18
July
201
8
Augu
st 2
018
Sept
embe
r 201
8
Octob
er 2
018
Novem
ber 2
018
Decem
ber 2
0 18
Janu
ary
2019
Febr
u ary
2019
March
201
9
No. of unique LR collected through ELRC-SHARE by month
• 1000 language resources
• Instrumental in building EN<>IS and EN<>NB engines
• Contributed to the development of the EN<>GA engine
• Main source for some of the domain-adapted engines
• Will be used for building domain generic engines
• Will be used by various other Natural Language Processing tools (summarization etc.)
• http://www.lr-coordination.eu/
Generic Services2016-EU-IA-0108 : Cross-border eProcurement notifications
2016-EU-IA-0111 : European LanguageResource Infrastructure (ELRI)
2016-EU-IA-0114 : Provision of Web-Scale Parallel Corpora for Official European Languages (PARACRAWL)
2016-EU-IA-0121 : CEF Automated Translation for the EU Council Presidency
2016-EU-IA-0122 : eTranslation TermBank
2016-EU-IA-0132 : IADAATPA (Intelligent Automated Domain Adapted Automated Translation for Public Administrations)
2017-EU-IA-0136 : Multilingual Resources for CEF.AT in the legal domain
2017-EU-IA-0149 : National European Central Translation Memory Data (NEC TM DATA)
2017-EU-IA-0151 : APE-QUEST
Automated Post-Editing (APE) & Quality Estimation (QE) for Electronic Exchange of Social Security Information (EESSI) and Online Dispute Resolution (ODR) Digital Service Infrastructures (DSIs) and related national services
2017-EU-IA-0169 : MICE
MT on standards and e-Business (BE) and e-Land register information (EST).Customization for public services. Open Source package with NLP & Post-Editing connectivity
2017-EU-IA-0178 : Broader Web-Scale Provision of Parallel Corpora for European Languages (PARACRAWL+)
7
Generic Services (2018)language resources projects
PRINCIPLE (2018-EU-IA-0050) EU contribution: €1,138,781; Schedule: September 2019 to August 2021; Coordinator: Dublin City University
Focus on the identification, collection and processing of language resources, for Croatian, Icelandic, Irish and Norwegian (Bokmål and Nynorsk)
Paracrawl 3: Continued Web-Scale Provision of Parallel Corpora for European Languages (2018-EU-IA-0063)EU contribution: €889,649; Schedule: October 2019 to September 2021; Coordinator: University of Edinburgh
Improved extraction software capable of efficiently processing an even larger portion of the Web, more than 1 compressed petabyte collecting larger corpora for under-resourcedlanguage pairs
EuroPat: Unleashing European Patent Translations (2018-EU-IA-0061)EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator: University of Edinburgh
Mining parallel corpora from patents by aggregating, aligning, and converting patent data targeted language pairs are English to/from Croatian, Norwegian (Bokmål), German, Polish, Spanish, French, Icelandic.
CEF Data Marketplace (2018-EU-IA-0049)EU contribution: €916,937; Schedule: November 2019 to October 2021; Coordinator: TRANSLATED SRL
Manage and trade data for all languages and domains enhance the availability of language data for under-resourced languages and domains
8
Generic Services (2018)integration projects
OCCAM: OCR, ClassificAtion & Machine Translation (2018-EU-IA-0052)EU contribution: €973,894; Schedule: October 2019 to September 2021; Coordinator: Brno University of Technology
Support the automated translation of scanned documents through image classification, translation memories, optical character recognition, and machine translation
NTEU: Neural Translation for the European Union (2018-EU-IA-0051)EU contribution: €1,649,042; Schedule: September 2019 to August 2021; Coordinator: Pangeanic
Neural engine farm, which will include all non-English and non-French language combinations for eTranslation
Translation Automation Services for EU Council Presidency (2018-EU-IA-0079)EU contribution: €827,734; Schedule: February 2019 to June 2020; Coordinator: Tilde
Focus on the multilingual challenges of the EU Council Presidencies: Romanian (2019), Finnish (2019) and Croatian (2020) presidencies. custom MT systems for the EU Presidency domain and the DSI domains at the focal point of the EU Council (like Cyber security or eJustice).
9