9
Language Technology and Digital Public Services Connecting Europe Facility – Automated Translation Philippe Gelin DG Connect Directorate-General for Communications Networks, Content and Technology

Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

Language Technology and Digital Public ServicesConnecting Europe Facility – Automated Translation

Philippe Gelin

DG ConnectDirectorate-General for Communications Networks, Content and Technology

Page 2: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

The Connecting Europe Facility – funding programme 2014 -2020

TRANSPORT€26.25bn

ENERGY€5.85bn

TELECOM

Broadband€170 M

Digital Service

Infrastructures€970 M *

CEF RegulationThe Connecting Europe Facility (CEF) is a regulation that defines how the Commission can finance support for the establishment of trans-European networks to reinforce an interconnected Europe.

* - 100 M Juncker Package

CEF Telecom GuidelinesThe CEF Telecom guidelines cover the specific objectives and priorities as well as eligibility criteria for funding of broadband networks and Digital Service Infrastructures (DSIs).

CEF Work ProgrammesTranslates the CEF Telecom Guidelines in general objectives and actions planned on a yearly basis.

CEF Automated Translation helps European and national public administrations exchange information across language barriers in the EU, by providing machine translation capabilities that will enable all Digital Service Infrastructures to be multilingual.

Page 3: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

eJustice Portal

Justice, home affairs and citizens' rights

ODR Open Data

Science and Technology Business

BRIS

CEF Digital at a glance

etc.

Funding at EU LEVEL (Commission)

Core Services

Employment and Social Rights

EESSI

Funding for theMEMBER STATES

Generic Services

IDENTIFYwith eID

SIGNwith eSignature

INVOICEwith eInvoicing

EXCHANGEwith eDelivery

TRANSLATEwith eTranslation

Typically 'deployment' projects at national level (up to 75% of eligible cost)

Page 4: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

CEF eTranslationThe same automatic translation for two complementary purposes

4

CEF eTranslation• CEF Digital Service Infrastructures• Pan-European digital public services• MS public administrations

DGT eTranslation• Translators of the EU Institutions• Digital services of the EU Institutions

Page 5: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

41 million pages in

2018

18 million pages in Q1

2019

CEF eTranslation - Capacity

Web Page use in 2019 so far• 51 million pages.

Current daily record• 3 Oct. – 989 000 pages.

12%

88%

Web Page Machine-to-machine

Page 6: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

Language Resource Collection

6

0

200

400

600

800

1000

1200

March

201

6

April

201

6

May 2

016

June

20 16

July

201

6

Augu

st 2

016

Sept

embe

r 201

6

Octob

er 2

016

Novem

ber 2

016

Decem

ber 2

0 16

Janu

ary

2017

Febr

u ary

2017

March

201

7

April

201

7

May 2

017

June

20 17

July

201

7

Augu

st 2

017

Sept

embe

r 201

7

Octob

er 2

017

Novem

ber 2

017

Decem

ber 2

0 17

Janu

ary

2018

Febr

u ary

2018

March

201

8

April

201

8

May 2

018

June

20 18

July

201

8

Augu

st 2

018

Sept

embe

r 201

8

Octob

er 2

018

Novem

ber 2

018

Decem

ber 2

0 18

Janu

ary

2019

Febr

u ary

2019

March

201

9

No. of unique LR collected through ELRC-SHARE by month

• 1000 language resources

• Instrumental in building EN<>IS and EN<>NB engines

• Contributed to the development of the EN<>GA engine

• Main source for some of the domain-adapted engines

• Will be used for building domain generic engines

• Will be used by various other Natural Language Processing tools (summarization etc.)

• http://www.lr-coordination.eu/

Page 7: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

Generic Services2016-EU-IA-0108 : Cross-border eProcurement notifications

2016-EU-IA-0111 : European LanguageResource Infrastructure (ELRI)

2016-EU-IA-0114 : Provision of Web-Scale Parallel Corpora for Official European Languages (PARACRAWL)

2016-EU-IA-0121 : CEF Automated Translation for the EU Council Presidency

2016-EU-IA-0122 : eTranslation TermBank

2016-EU-IA-0132 : IADAATPA (Intelligent Automated Domain Adapted Automated Translation for Public Administrations)

2017-EU-IA-0136 : Multilingual Resources for CEF.AT in the legal domain

2017-EU-IA-0149 : National European Central Translation Memory Data (NEC TM DATA)

2017-EU-IA-0151 : APE-QUEST

Automated Post-Editing (APE) & Quality Estimation (QE) for Electronic Exchange of Social Security Information (EESSI) and Online Dispute Resolution (ODR) Digital Service Infrastructures (DSIs) and related national services

2017-EU-IA-0169 : MICE

MT on standards and e-Business (BE) and e-Land register information (EST).Customization for public services. Open Source package with NLP & Post-Editing connectivity

2017-EU-IA-0178 : Broader Web-Scale Provision of Parallel Corpora for European Languages (PARACRAWL+)

7

Page 8: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

Generic Services (2018)language resources projects

PRINCIPLE (2018-EU-IA-0050) EU contribution: €1,138,781; Schedule: September 2019 to August 2021; Coordinator: Dublin City University

Focus on the identification, collection and processing of language resources, for Croatian, Icelandic, Irish and Norwegian (Bokmål and Nynorsk)

Paracrawl 3: Continued Web-Scale Provision of Parallel Corpora for European Languages (2018-EU-IA-0063)EU contribution: €889,649; Schedule: October 2019 to September 2021; Coordinator: University of Edinburgh

Improved extraction software capable of efficiently processing an even larger portion of the Web, more than 1 compressed petabyte collecting larger corpora for under-resourcedlanguage pairs

EuroPat: Unleashing European Patent Translations (2018-EU-IA-0061)EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator: University of Edinburgh

Mining parallel corpora from patents by aggregating, aligning, and converting patent data targeted language pairs are English to/from Croatian, Norwegian (Bokmål), German, Polish, Spanish, French, Icelandic.

CEF Data Marketplace (2018-EU-IA-0049)EU contribution: €916,937; Schedule: November 2019 to October 2021; Coordinator: TRANSLATED SRL

Manage and trade data for all languages and domains enhance the availability of language data for under-resourced languages and domains

8

Page 9: Language Technology and Digital Public Services · 2019-10-15 · Translations (2018 -EU-IA-0061) EU contribution: €695,890; Schedule: September 2019 to September 2021; Coordinator:

Generic Services (2018)integration projects

OCCAM: OCR, ClassificAtion & Machine Translation (2018-EU-IA-0052)EU contribution: €973,894; Schedule: October 2019 to September 2021; Coordinator: Brno University of Technology

Support the automated translation of scanned documents through image classification, translation memories, optical character recognition, and machine translation

NTEU: Neural Translation for the European Union (2018-EU-IA-0051)EU contribution: €1,649,042; Schedule: September 2019 to August 2021; Coordinator: Pangeanic

Neural engine farm, which will include all non-English and non-French language combinations for eTranslation

Translation Automation Services for EU Council Presidency (2018-EU-IA-0079)EU contribution: €827,734; Schedule: February 2019 to June 2020; Coordinator: Tilde

Focus on the multilingual challenges of the EU Council Presidencies: Romanian (2019), Finnish (2019) and Croatian (2020) presidencies. custom MT systems for the EU Presidency domain and the DSI domains at the focal point of the EU Council (like Cyber security or eJustice).

9