81
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899). Multilingualism for Digital Europe Georg Rehm General Secretary META-NET, Coordinator CRACKER DFKI, Germany [email protected] Ringvorlesung Digitale Lebenswelten – Universität Hildesheim, 15 th November 2016

Multilingualism for Digital Europe

Embed Size (px)

Citation preview

Page 1: Multilingualism for Digital Europe

META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER(grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).

Multilingualismfor Digital Europe

Georg RehmGeneral Secretary META-NET, Coordinator CRACKER

DFKI, [email protected]

Ringvorlesung Digitale Lebenswelten – Universität Hildesheim, 15th November 2016

Page 2: Multilingualism for Digital Europe

Outlineq A Multilingual Europe Initiative: META-NET

§ LT Support – META-NET White Paper Series

§ LT Strategy – META-NET SRAq Continuing the Initiative – Recent Developments

§ The Digital Single Market and Multilingualism

§ Cracking the Language Barrier

§ META-FORUM 2015/2016 – MDSM SRIA V0.5/V0.9q Goals and Next Steps

http://www.meta-net.eu 2

Page 3: Multilingualism for Digital Europe

META-NET and META:Brief History

http://www.meta-net.eu 3

Page 4: Multilingualism for Digital Europe

Multilingual Europe in 2010

4http://www.meta-net.eu

q Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage.

q While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years.

q All stakeholders – researchers, LT industries, policy makers, language communities, funding programmes – should

team up in a strategic alliance for a major dedicated push.

Page 5: Multilingualism for Digital Europe

q

60 research centres in 34 countries (founded in 2010)Chair of Executive Board: Jan Hajic (CUNI)Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI)

q

Multilingual Europe Technology Alliance.826 members in 67 countries

(published in 2013) (31 volumes; published in 2012)

T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET

Page 6: Multilingualism for Digital Europe

META-NETWhite Paper Series

http://www.meta-net.eu 6

Page 7: Multilingualism for Digital Europe

q Basqueq Bulgarian*q Catalanq Croatian*q Czech*q Danish*q Dutch*q English*q Estonian*q Finnish*q French*

q Galicianq German*q Greek*q Hungarian*q Icelandicq Irish*q Italian*q Latvian*q Lithuanian*q Maltese*q Norwegian

q Polish*q Portuguese*q Romanian*q Serbianq Slovak*q Slovene*q Spanish*q Swedish*q Welsh

* Official EU languagehttp://www.meta-net.eu/whitepapers

Page 8: Multilingualism for Digital Europe

Cross-Lingual Comparison

q 1. Machine Translation 2. Text Analytics3. Speech Processing/Synthesis 4. Language Resources

q Ranking: from excellent LT support to weak/no LT support.q Cross-lingual comparison discussed and finalised at a network

meeting with representatives of all languages (Oct., 2011).

http://www.meta-net.eu 8

Page 9: Multilingualism for Digital Europe

MT

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support through LT

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,

Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician,

Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,

Welsh

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan,Czech, Danish, Finnish,

Galician, Greek, Hungarian, Norwegian, Polish,

Portuguese, Romanian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,

Serbian, Welsh

excellent

English

good

Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish,

Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,

Finnish, Galician, Greek, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh

weak or no support through LTexcellent

Res

ourc

esTe

xt A

naly

tics

Page 10: Multilingualism for Digital Europe

Fragmentary

Weak/none

Moderate

Good

Excellent

Welsh

Maltese

Lithuanian

Latvian

Icelandic

Irish

Croatian

Serbian

Estonian

Slovene

Slovak

Roma

nian

Norwegian

Greek

Galician

Danish

Bulgarian

Basque

Swedish

Portu

guese

Finnish

Catal

anPo

lish

Hung

arian

Czech

Italia

nGe

rman

Dutch

Span

ishFre

nch

Engli

sh

Leve

l of s

uppo

rt

Languages with names in redhave little or no MT support

Results of the META-­NET White Paper Study (2012)

Page 11: Multilingualism for Digital Europe

Observations and Results

http://www.meta-net.eu 11

q When it comes to technology support, there are massive differences between Europe’s languages and technology areas.

q Support for English is ahead ofany other language.

q But: even support for English is far from being perfect.

q Several languages get the weakest score in all four areas (e.g., Icelan-dic, Latvian, Lithuanian, Maltese)!

Page 12: Multilingualism for Digital Europe

Digital Language Extinction!

q “At Least 21 European Languages in Danger of Digital Extinction!”

q Press release on European Day of Languages (Sept. 26, 2012).

q Huge global interest in the topic and our key findings!

q 600+ mentions in the press.

q News from 40+ countries in 35+ different languages.

q 20+ television reports and 30+ broadcast interviews (radio, tv) with META-NET representatives.

q Two Parliamentary Questions in the EP on the “digital extinction of languages” topic.

q These results lead to a STOA Workshop in the EP (Dec. 3, 2013).

http://www.meta-net.eu 12

Page 13: Multilingualism for Digital Europe

Af Flemming Steen Pedersen// [email protected]

Langt flere kræftpatienter i hovedstadsområ-det skal behandles hurtigt og uden forsinkel-ser.

Det skal være slut med, at undersøgelse og behandling trækker i langdrag og overskrider de tidsfrister, som fagfolk har fastsat for at give patienterne de optimale chancer for at over-leve den frygtede sygdom.

Det er målet, når politikere i Region Hoved-staden nu lægger op til at udmønte en pulje på 32 mio. kr. til at øge personalet og udvide behandlingskapaciteten på kræftområdet på en række af regionens hospitaler.

Pengene kommer, efter at regionen er blevet kritiseret for, at alt for mange kræft-patienter er for lang tid om at komme igen-nem systemet. F.eks. er det ifølge den seneste opgørelse kun godt halvdelen af kvinder med brystkræft, som bliver behandlet inden for det fastsatte mål på 18 dage i de såkaldte kræft-pakker.

»Pengene betyder, at der kommer bedre forhold for kræftpatienter. Det er vigtigt, at folk får mulighed for at blive behandlet hur-tigt, så de ikke skal gå rundt og være bekym-rede,« siger formand for kvalitetsudvalget i Region Hovedstaden, Kirsten Lee (R).

Flere får kræft – og flere overleverKonkret er hensigten at udvide den onkologi-ske kapacitet – det vil sige stråle- og kemobe-handlingen – på såvel Rigshospitalet, Herlev Hospital, Hillerød Hospital og Bornholms Hospital.

Desuden sættes der penge af til at øge antal-let af operationer og udvide ambulatorieka-paciteten på det urologiske område på Herlev,

Bispebjerg og Frederiksberg. Foruden pro-blemer med lange ventetider for brystkræft-patienter er der således også patienter med prostatakræft, som venter for længe. På dags-ordenen er også at sikre hurtigere behandling til en tredje gruppe af patienter med hoved-halskræft, hvor et stort antal patienter ligele-des må vente længere end tidsgrænsen på 16 dage.

Udover at tilføre flere penge overvejes det også at indføre såkaldte servicemål for, hvor stor en andel af patienterne der skal i behandling inden for de fastsatte tidsgrænser i kræftpakkerne. Lignende servicemål findes i forvejen i Region Midtjylland og Region Syddanmark og betragtes som et middel til at presse hospitalerne og signalere, at bestemte områder har særlig høj politisk bevågenhed.

I de to regioner er målet, at henholdvis 90 og 95 pct. af patienterne skal igennem syste-met inden for forløbstiderne, og Kirsten Lee forventer, at et eventuelt servicemål i Region Hovedstaden kommer til at ligge på et tilsva-rende niveau.

I Kræftens Bekæmpelse hilser direktør Leif Vestergaard Pedersen det velkomment, at Region Hovedstaden nu bruger 32 mio. kr. til at udvide kapaciteten .

»Det har vist sig, at der er et forbedringspo-tentiale på dette område, og derfor er det godt, at man prioriterer det. Flere og flere får kræft, og flere og flere overlever. Det betyder, at kapa-citeten gradvist skal øges hele tiden. Service-mål er et godt initiativ, og et mål på 90-95 pct. er nok det realistiske, selv om udgangspunk-tet bør være 100 procent,« siger Leif Vesterga-ard Pedersen og tilføjer:

»Men så er det også vigtigt at holde fast i det mål og ikke stille sig tilfreds med, at 80 eller 85 pct. kommer igennem til tiden.« B

Kræft syge skal have hurtigerebehandling

Oprustning. Region Hovedstaden bruger 32 mio. kr. på at øge behandlingskapaciteten.

Af Jens Ejsing// [email protected]

Det danske sprog har det svært i den digitale verden.

Det konstaterer danske sprogforskere- og eksperter i forbindelse med den nye inter-nationale undersøgelse META-NET, der ser nærmere på, hvordan en lang række mindre, europæiske sprog som dansk klarer sig i den digitale verden.

Forskerne fra bl.a. Københavns Universitet og Dansk Sprognævn når frem til, at dansk i fremtiden kan få det endnu sværere i den digitale verden, fordi Google Translate, GPSer, applikationer til smartphones og andre sprog-teknologiske programmer ikke i tilstrækkelig grad formår at behandle de mange nuancer i det danske sprog.

Professor i sprogteknologi på Københavns Universitet, Bolette Sandford Pedersen, mener, at der er brug for en slags digital dansk sprogbank fyldt med data, så bl.a. oversættel-ser bliver så præcise og gode som muligt. Med

hjælp fra sprogbanken kan forskere ifølge professoren hjælpe virksomheder med at for-bedre programmer, der skal håndtere sproglig viden om bl.a. maskinoversættelse, tale-genkendelse og informationssøgning.

Dermed vil der blive længere mellem fejlag-tige oversættelser, som når »hæld olie på pan-den« med Google Translate bliver til »pour oil on the forehead« på engelsk. Oversættelser, der er i værste fald er så upræcise, at danskere ender med at fravælge deres eget sprog i den digitale verden.

Sproghjælp til virksomhederHun anerkender dog, at »teknologien til auto-matiske oversættelser på mange måder er fantastisk«.

»Den er bare ikke god nok, når det gælder dansk,« siger hun:

»Det er som om, at vi i et vist omfang lægger det i hænderne på Google eller andre virk-somheder at afgøre, om dansk skal behandles godt nok eller ej. Men det danske marked er ikke stort for dem. Spørgsmålet er derfor,

Dårlig sprogteknologi truer dansk på nettetOrd. Forskere arbejder på at forbedre danske oversættelser på internettet.

om vi ikke i højere grad selv skal gøre noget for at sikre, at det fornødne datamateriale er til rådighed, så vi får gode oversættelser og anden god sprogteknologi. Det kunne f.eks. være ved, at vi gjorde en indsats for at få opret-tet en sprogbank med en masse beriget mate-riale om dansk.«

»Hvis vi hele tiden oplever, at oversættel-ser er behæftede med fejl, tør vi ikke stole på dem,« siger hun og understreger, at »fejlagtige oversættelser kan føre til store misforståelser«.

Ifølge Dansk Sprognævns direktør, Sabine Kirchmeier-Andersen, kan dårlig sprogtekno-logi have konsekvenser for mange danskere, der ikke er så gode til engelsk.

»Hvis vi har ambitioner om at bruge det danske sprog i fremtidens teknologiske univers, skal der gøres en indsats nu for at fastholde ekspertise og udbygge den viden, vi har,« mener hun:

»Ellers risikerer vi, at kun folk, der taler fly-dende engelsk, vil få glæde af de nye generatio-ner af web-, tele- og robotteknologi, der er på vej.« B

INFOGRAFIK: HENRIK KIÆR / TEKST: FLEMMING STEEN PEDERSEN KILDE: REGION HOVEDSTADEN

De såkaldte kræftpakker, der blev indført i 2008 og 2009 for at sikre de danske kræftpatienter langt hurtigere undersøgelser og behandling, beskriver et standardudrednings- og -behand-lingsforløb. Det vil sige, hvilke undersøgelser og behandlinger der skal udføres, og hvor lang tid der højst må gå med de enkelte aktiviteter. Opgørelser fra Region Hovedstaden viser imidlertid, at en stor del af patienterne ikke behandles inden for de fastsatte tidsgrænser, og at der især er problemer inden for tre kræftsygdomme: brystkræft, hoved- og halskræft og prostatakræft.

Kræftbehandling trækker ud

PROSTATAKRÆFTServicemål: 35-39 dage

24

76

HOVED- OG HALSKRÆFTServicemål: 16 dage

40

60

BRYSTKRÆFTServicemål: 18 dage

4753

Procentdel inden for servicemål

Procentdel uden for servicemål

Sådan læses grafikken:

Positiv udviklingNegativ udvikling

H Der er omkring 80 sprog i EU. For 21 af dem – også dansk – gælder det, at der er store sprogteknologiske mangler, når det gælder bl.a. maskinoversættelse, talegenken-delse og informationssøgning.

H Ifølge en EU-undersøgelse køber et stigende antal europæiske internetbrugere varer eller tjenester på nettet, hvor det sprog, der bliver anvendt, ikke er deres eget. Det gælder over halvdelen af brugerne.

H Over hver tredje anvender et fremmed-sprog til at skrive mail eller indlæg på nettet.

fakta HSprog i Europa

REDIGERET AF JOANNA VALLENTIN. LAYOUT: JACOB FRIIS/ NATIONALT /06. BERLINGSKE / 1.SEKTION / LØRDAG 22.09.2012

38

Στην ψηφιακή εποχή δεν… µιλούν ελληνικά, όπως και αρκετές άλλες ευρωπαϊκές

γλώσσες, σύµφωνα µε πανευρωπαϊ-κή έκθεση µε την υπογραφή 200 και πλέον ειδικών. Η συγκεκριµένη µελέ-τη δηµοσιεύτηκε από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ µε αφορµή τη χτε-σινή Ευρωπαϊκή Ηµέρα Γλωσσών.

Για τις ανάγκες της έρευνάς τους, γλωσσολόγοι από 34 χώρες της Γη-ραιάς Ηπείρου βαθµολόγησαν τις διαθέσιµες γλωσσικές υπηρεσίες και δηµιούργησαν ένα «Λευκό Βι-βλίο» για κάθε ευρωπαϊκή γλώσσα. Στη µελέτη τους, οι ειδικοί αναζήτη-σαν µεταξύ άλλων τέσσερα βασικά ηλεκτρονικά εργαλεία, δηλαδή την ύπαρξη αυτόµατης µετάφρασης, τη δυνατότητα φωνητικής αλληλε-πίδρασης και ψηφιακής ανάλυσης κειµένου, ενώ ταυτόχρονα διερευνή-θηκε και η διαθεσιµότητα γλωσσικών πόρων ή πηγών.

Σε πρώτη φάση εξέτασαν τις ιστο-σελίδες που επιτρέπουν στους χρή-στες να κάνουν µεταφράσεις online, όπως, για παράδειγµα, η υπηρεσία του κολοσσού πληροφορικής Google Translate. Την ίδια ώρα, εξετάστηκε και η «επικοινωνία» των ελληνόφω-νων χρηστών µε τις…συσκευές τους, όπως για παράδειγµα η δυνατότητα

να «µιλήσει» κάποιος στο GPS στη µητρική του γλώσσα. Οι ερευνητές κατέληξαν στο συµπέρασµα ότι υπάρχουν τέτοιες συσκευές, αλλά δεν είναι τόσο διαδεδοµένες όσο οι αγγλόφωνες. Το «χρυσό» µετάλλιο κατακτά,

όπως είναι άλλωστε και λογικό, η αγγλική γλώσσα. Οι αγγλόφωνοι χρή-στες έχουν την καλύτερη δυνατή τε-χνολογική υποστήριξη, κάτι το οποίο ευνοεί την περαιτέρω εξάπλωση της γλώσσας. Από «τεχνολογικό απο-κλεισµό» κινδυνεύουν περισσότερο η ισλανδική, η λετονική, η λιθουανική και η µαλτέζικη γλώσσα, ενώ σε λίγο καλύτερη µοίρα βρίσκονται η ελλη-νική, η βουλγαρική, η ουγγρική και η πολωνική, που όπως αναφέρει η έρευνα έχουν «αποσπασµατική» τε-χνολογική υποστήριξη.

«Μέτρια» χαρακτηρίζεται η υπο-στήριξη χρηστών σε ολλανδική, γαλ-λική, γερµανική, ιταλική και ισπανική γλώσσα. Οι επικεφαλής της επιστη-µονικής οµάδας, Χανς Ουζκοράιτ και Γκεόργκ Ρεµ, αναφέρουν χαρακτηρι-στικά: «Υπάρχουν δραµατικές διαφο-ρές στην υποστήριξη της γλωσσικής

τεχνολογίας ανάµεσα στις διάφορες ευρωπαϊκές γλώσσες. Το χάσµα µετα-ξύ “µικρών” και “µεγάλων” γλωσσών ολοένα και διευρύνεται. Πρέπει να εξασφαλίσουµε τον εφοδιασµό των µικρότερων και λιγότερο πλούσιων σε ψηφιακούς πόρους γλωσσών µε τις απαραίτητες βασικές τεχνολογί-ες. ∆ιαφορετικά, οι γλώσσες αυτές είναι καταδικασµένες σε ψηφιακή εξαφάνιση».

Μάλιστα, οι ειδικοί τονίζουν ότι χω-ρίς αποφασιστική δράση οι γλώσσες αυτές δύσκολα θα… επιβιώσουν στον ψηφιακό κόσµου του 21ου αιώνα. Η κ. Μαρία Γαβριηλίδου, µέλος της επι-στηµονικής οµάδας από το Ινστιτούτο

Επεξεργασίας του Λόγου Ερευνητικό Κέντρο Αθηνά, λέει στον «Ε.Τ.»: «Η έρευνα αυτή δεν λέει ότι δεν θα ζήσει η ελληνική γλώσσα ή ότι κινδυνεύει µε εξαφάνιση». Η ειδικός εξηγεί ότι όσο υπάρχουν άνθρωποι που µιλά-νε, γράφουν και επικοινωνούν µε µια γλώσσα, τότε αυτή θα συνεχίσει να υπάρχει. Είναι σηµαντικό, όµως, να έχουν όλοι οι χρήστες τη δυνατότητα να «µιλήσουν» στις µηχανές, όπως τα GPS τους, στα ελληνικά και να έχουν στη διάθεσή τους γλωσσικά εργαλεία ηλεκτρονικών υπολογιστών.

Μεταξύ αυτών των «εργαλείων» είναι οι διορθωτές ορθογραφικών και συντακτικών λαθών, που χρησιµοποι-ούνται καθηµερινά από εκατοντάδες Ελληνες χρήστες και βασίζονται στη γλωσσική τεχνολογία. Παρ’ όλα αυτά, τονίζει ότι η ψη-

φιακή εξάπλωση µιας γλώσσας είναι σηµαντική «∆εν είναι στα χέρια του µέσου χρήστη. Οι εκάστοτε κυβερ-νήσεις, η Ευρωπαϊκή Ενωση και ο ιδιωτικός τοµέας πρέπει να χρηµα-τοδοτήσουν την ανάπτυξη αυτής της τεχνολογίας για όλες τις γλώσσες», αναφέρει και συνεχίζει: «Οι χρήστες, όµως, πρέπει να απαιτούν να υπάρ-χουν και στη γλώσσα τους τα µέσα αυτά και να µην ικανοποιούνται µε τα αγγλικά».

Πέµπτη 27 Σεπτεµβρίου 2012 ΕΛΕΥΘΕΡΟΣ ΤΥΠΟΣ

LifeΠΟΛΛΕΣ ΕΥΡΩΠΑΪΚΕΣ ΓΛΩΣΣΕΣ ΘΕΩΡΟΥΝΤΑΙ ΤΕΧΝΟΛΟΓΙΚΑ… ΞΕΠΕΡΑΣΜΕΝΕΣ

Με ψηφιακή εξαφάνιση κινδυνεύουν τα ελληνικά

ΕΛΕΝΗ ΒΕΡΓΟΥ[email protected]

Η γλώσσα της αποξένωσης…

XX GREEKLISH

Οι αγγλόφωνοι χρήστες έχουν την καλύτερη δυνατή τεχνολογική υποστήριξη, γεγονός που ευνοεί την περαιτέρω εξάπλωση της γλώσσας

ΜΕ GREEKLISH επικοινω-νούν πλέον µέσω µηνυµά-των ή email οι περισσότεροι νέοι της χώρας µας. Παρά το γεγονός ότι τα τελευ-ταία χρόνια υπάρχουν τα γλωσσικά εργαλεία, τα οποία επιτρέπουν τη χρήση της ελληνικής γραµµατο-σειράς, έφηβοι και νέοι ενήλικες φαίνεται ότι δεν έχουν «αγκαλιάσει» αυτές τις τεχνολογίες. Ο καθη-γητής Γλωσσολογίας, κ. Γιώργος Μπαµπινιώτης, λέει στον «Ε.Τ.»: «Τα greeklish είναι πρόβληµα για την ελληνική γλώσσα, ιδίως για ανθρώπους νέας ηλικίας για έναν καθαρά γλωσσικό λόγο. Με τη χρήση των greeklish αποξενώνονται από τη µορφή της λέξης ή όπως λέµε το ετυµολογικό ίνδαλµα που δηλώνεται µε την ορθογραφία της λέξης και συνδέεται και µε τη ση-µασία της λέξης και µε την προέλευσή της». Ο κίνδυνος, µε τον οποίο έρχονται αντι-µέτωποι οι νέοι άνθρωποι, είναι η αποξένωση από τη γραπτή µορφή της γλώσ-σας. Αυτή η «οικειότητα», όµως, βοηθάει και στην κατανόηση της σηµασίας αλλά και την προέλευση της λέξης. «Αυτή η αποξένωση δεν είναι άνευ σηµασίας», αναφέρει ο ειδικός, ο οποίος εξηγεί ότι η διαδικασία της γραφής βοηθάει να εντυπω-θεί η λέξη και να συνδεθεί µε άλλες οµόρριζες λέξεις. «Οταν χρησιµοποιείται αυτή η µορφή επικοινωνίας, κα-ταστρέφονται, ατονούν. ∆εν είναι προς θάνατο, αλλά θα κάνει ζηµιά», αναφέρει ο κ. Μπαµπινιώτης, ο οποίος συµβουλεύει τους χρήστες να επιλέγουν την ελληνική γραµµατοσειρά.

Γιώργος Μπαµπινιώτης.

Date 30 September 2012 Page 16

Copyright material. This may only be copied under the terms of a Newspaper Licensing Agency agreement (www.nla.co.uk) or with written publisher permission. For external republishing rights see www.nla-republishing.com

49KYPIAKH 30 ΣΕΠΤΕΜΒΡΙΟΥ 2012

Η 26η Σεπτεµβρίου έχει καθιε-ρωθεί από το Συµβούλιο τηςΕυρώπης ως η ΕυρωπαϊκήΗµέρα των Γλωσσών, αλλά,

σύµφωνα µε µια νέα ευρωπαϊκή επι-στηµονική έκθεση, οι 21 από τις 30γλώσσες της Ευρώπης -µεταξύ των οποί-ων και η Ελληνική- αντιµετωπίζουν κίν-δυνο ψηφιακής εξαφάνισης. Η έρευνα κρούει τον κώδωνα κινδύ-

νου, καθώς διαπίστωσε ότι η ψηφιακήβοήθεια για τις περισσότερες ευρωπαϊκέςγλώσσες είναι ελλιπής ή απολύτως ανύ-παρκτη για τους χρήστες.

Τις έφαγαν οι κοινέςΗ έκθεση, µε τη µορφή µιας σειράς

Λευκών Βίβλων (µε τίτλο «Γλώσσες στηνΕυρωπαϊκή Κοινωνία της Πληροφορίας»),από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ, το οποίο συνενώνει 60 ερευνητικάκέντρα σε 34 χώρες, επισηµαίνει ότι οιγλώσσες που µιλιούνται από σχετικάµικρό αριθµό ανθρώπων κινδυνεύουν,επειδή δεν έχουν τεχνολογική υποστή-ριξη όπως έχουν οι ευρέως χρησιµο-ποιούµενες γλώσσες. Λευκές Βίβλοιέχουν καταρτιστεί για τις εξής ευρω-παϊκές γλώσσες: αγγλικά, βασκικά,βουλγαρικά, γαλικιανά, γαλλικά, γερ-µανικά, δανικά, ελληνικά, εσθονικά,ιρλανδικά, ισλανδικά, ισπανικά, ιταλικά,καταλανικά, κροατικά, λετονικά, λι-θουανικά, µαλτέζικα, νορβηγικά (µπουκ-µόλ και νινόρσκ), ολλανδικά, ουγγρικά,πολωνικά, πορτογαλικά, ρουµανικά,σερβικά, σλοβακικά, σλοβενικά, σουη-δικά, τσεχικά και φινλανδικά. ΚάθεΛευκή Βίβλος είναι γραµµένη στη γλώσ-σα στην οποία αναφέρεται και είναιµεταφρασµένη στα αγγλικά.

Τέσσερις µεγάλοι κίνδυνοιΣύµφωνα µε τη νέα µελέτη, η Ισ-

λανδική, η Λετονική, η Λιθουανική καιη Μαλτέζικη αντιµετωπίζουν τον µε-γαλύτερο κίνδυνο εξαφάνισης σε µιαευρωπαϊκή τεχνολογική κοινωνία, πουολοένα περισσότερο προωθεί τη χρήσησυγκεκριµένων γλωσσών και ιδίως τηςΑγγλικής. Όµως και άλλες γλώσσες,όπως η Ελληνική, η Βουλγαρική, η Ουγ-γρική και η Πολωνική, επίσης κινδυ-νεύουν στον σύγχρονο ψηφιακό κόσµο. Η έρευνα του ΜΕΤΑ-ΝΕΤ, στην οποία

συνέβαλαν περισσότεροι από 200 ειδικοί,αξιολογεί τον κίνδυνο για κάθε γλώσσαµε βάση τέσσερα βασικά κριτήρια σετεχνολογικό/ψηφιακό επίπεδο: την ύπαρ-ξη αυτόµατης µετάφρασης στη συγκε-κριµένη γλώσσα, τη δυνατότητα φωνη-τικής αλληλεπίδρασης, τη δυνατότηταψηφιακής ανάλυσης κειµένου και τηδιαθεσιµότητα των σχετικών ψηφιακώνγλωσσικών πόρων/πηγών.

Οι δυνατέςΗ γλώσσα µε την καλύτερη βαθµο-

λογία στα κριτήρια είναι ασφαλώς ηΑγγλική, που απολαµβάνει τη συγκριτικάκαλύτερη τεχνολογική υποστήριξη (ανκαι όχι την καλύτερη δυνατή), γεγονόςπου διευκολύνει την περαιτέρω εξά-πλωσή της.

Ακολουθούν µε ικανοποιητική ή µέ-τρια τεχνολογική/ψηφιακή υποστήριξηη Ολλανδική, η Γαλλική, η Γερµανική,η Ιταλική και η Ισπανική. Η Ελληνική,όπως επίσης η Βασκική, η Καταλανική,η Πολωνική, η Ουγγρική κ.ά. κατα-τάσσονται στις γλώσσες µε «αποσπα-σµατική» µόνο υποστήριξη, γι’ αυτόακριβώς θεωρούνται γλώσσες υψηλούκινδύνου προς εξαφάνιση.

Δραµατικές διαφορές Σύµφωνα µε τους επιµελητές της µε-

λέτης Χανς Ουζκοράιτ και Γκέοργκ Ρεµ,«υπάρχουν δραµατικές διαφορές στηνυποστήριξη της γλωσσικής τεχνολογίαςανάµεσα στις διάφορες ευρωπαϊκέςγλώσσες και τεχνολογικές περιοχές. Τοχάσµα µεταξύ ‘µικρών’ και ‘µεγάλων’γλωσσών ολοένα και διευρύνεται. Πρέπεινα εξασφαλίσουµε τον εφοδιασµό τωνµικρότερων και λιγότερο πλούσιων -σεψηφιακούς πόρους- γλωσσών µε τιςαπαραίτητες βασικές τεχνολογίες, αλλιώςοι γλώσσες αυτές είναι καταδικασµένεςσε ψηφιακή εξαφάνιση».Ως ελπίδα αυτών των γλωσσών θεω-

ρείται η βελτίωση και η ευρύτερη αξιο-ποίηση του λογισµικού γλωσσικής τε-χνολογίας, το οποίο επιτρέπει τη φω-νητική και τη γραπτή επεξεργασία τωνδιαφόρων γλωσσών. Παραδείγµατα αυτών των δυνατοτή-

των είναι οι ηλεκτρονικοί ορθογραφικοίκαι συντακτικοί διορθωτές κειµένων,οι διαδραστικοί προσωπικοί «βοηθοί»των έξυπνων κινητών τηλεφώνων (π.χ.η Siri στο iPhone), τα συστήµατα αυ-τόµατης µετάφρασης, τα ηλεκτρονικάσυστήµατα διαλόγου των τηλεφωνικώνκέντρων, οι µηχανές αναζήτησης, ησυνθετική φωνή στα συστήµατα πλοή-γησης των αυτοκινήτων. κ.ά.

Το βασικό πρόβληµαΤο σηµαντικό, σύµφωνα µε την έκ-

θεση, είναι όλες αυτές οι δυνατότητεςνα προσφέρονται στους χρήστες και στηµητρική τους γλώσσα που κινδυνεύειµε εξαφάνιση. Χωρίς αποφασιστική δρά-ση, γίνεται η δυσοίωνη πρόβλεψη ότιοι γλώσσες αυτές δύσκολα θα επιβιώσουνστον ψηφιακό κόσµο του 21ου αιώνα.Ένα πρόβληµα είναι ότι το λογισµικό

αυτών των συστηµάτων γλωσσικής τε-χνολογίας στηρίζεται σε στατιστικές µε-θόδους που απαιτούν τεράστιες ποσό-τητες γραπτών ή φωνητικών δεδοµένων,όµως τόσα πολλά δεδοµένα είναι δύσκολονα αποκτηθούν για γλώσσες που οµι-λούνται από σχετικά λίγους ανθρώπους.Εξάλλου, ακόµα και για ευρέως χρη-

σιµοποιούµενες γλώσσες όπως τα αγ-γλικά, η σχετική γλωσσική τεχνολογίαέχει ακόµα αδυναµίες, που είναι π.χ.φανερές στις άκρως ανεπαρκείς και γε-µάτες λάθη αυτόµατες µεταφράσεις. Ηέκθεση προτείνει ότι πρέπει να αναληφθείµια συντονισµένη µεγάλης κλίµακαςπροσπάθεια στην Ευρώπη, προκειµένουσταδιακά να δηµιουργηθούν ή να βελ-τιωθούν οι αναγκαίες τεχνολογίες καινα βοηθηθούν οι γλώσσες που είναι ψη-φιακά παραγκωνισµένες.

Τη γλώσσα µού... έχασαν

Οι περισσότερες ευρωπαϊκές γλώσσες κινδυνεύουν µε ψηφιακή εξαφάνιση

Πρέπει να εξασφαλιστεί ο εφοδιασµός των µικρότερων και λιγότερο πλούσιων-σε ψηφιακούς πόρους- γλωσσών µε τις απαραίτητες βασικές τεχνολογίες

?049-ΚΟΣΜΟΣ 29/09/2012 1:41 ?Μ Page 49

Page 14: Multilingualism for Digital Europe

Update of the Study (2014)

q Study comprised 31 volumes/languages.q Many languages missing! Need for

extension – at least of the comparison.q We invited three language community

bodies to participate in the update:European Federation of National Institutions for Language (EFNIL)Network to Promote Linguistic Diversity (NPLD)Experts Committee of the European Language Charter (Council of Europe)

http://www.meta-net.eu 14

CCURL 2014 – Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era

$Q 8SGDWH DQG ([WHQVLRQ RI WKH 0(7$1(7 6WXG\³(XURSH¶V /DQJXDJHV LQ WKH 'LJLWDO $JH´

*HRUJ 5HKP +DQV 8V]NRUHLW ,GR 'DJDQ 9DUWNHV *RHWFKHULDQ 0HKPHW 8JXU 'RJDQ &RVNXQ 0HUPHU 7DPiV 9DUDGL 6DELQH .LUFKPHLHU$QGHUVHQ

*HUKDUG 6WLFNHO 0HLULRQ 3U\V -RQHV 6WHIDQ 2HWHU 6LJYH *UDPVWDG

0(7$1(7')., *PE+%HUOLQ *HUPDQ\

0(7$1(7%DU,ODQ 8QLYHUVLW\7HO $YLY ,VUDHO

0(7$1(7$UD[ /WG/X[HPERXUJ

0(7$1(77ELWDN %LOJHP*HE]H 7XUNH\

()1,/ 0(7$1(7+XQJDULDQ $FDGHP\ RI 6FLHQFHV%XGDSHVW +XQJDU\

()1,/ 0(7$1(7'DQLVK /DQJXDJH &RXQFLO&RSHQKDJHQ 'HQPDUN

()1,/,QVWLWXW IU 'HXWVFKH 6SUDFKH0DQQKHLP *HUPDQ\

13/'1HWZRUN WR 3URPRWH /LQJ 'LYHUVLW\&DUGLII :DOHV

&RXQFLO RI (XURSH &RP RI ([SHUWV8QLYHUVLW\ RI +DPEXUJ+DPEXUJ *HUPDQ\

&RXQFLO RI (XURSH &RP RI ([SHUWV%HUJHQ 1RUZD\

$EVWUDFW7KLV SDSHU H[WHQGV DQG XSGDWHV WKH FURVVODQJXDJH FRPSDULVRQ RI /7 VXSSRUW IRU (XURSHDQ ODQJXDJHV DV SXEOLVKHG LQ WKH0(7$1(7 /DQJXDJH :KLWH 3DSHU 6HULHV 7KH XSGDWHG FRPSDULVRQ FRQILUPV WKH RULJLQDO UHVXOWV DQG SDLQWV DQ DODUPLQJ SLFWXUHLW GHPRQVWUDWHV WKDW WKHUH DUH HYHQ PRUH GUDPDWLF GLIIHUHQFHV LQ /7 VXSSRUW EHWZHHQ WKH (XURSHDQ ODQJXDJHV

.H\ZRUGV/5 1DWLRQDO,QWHUQDWLRQDO 3URMHFWV ,QIUDVWUXFWXUDO3ROLF\ ,VVXHV 0XOWLOLQJXDOLW\ 0DFKLQH 7UDQVODWLRQ

,QWURGXFWLRQ DQG 2YHUYLHZ

7KH PXOWLOLQJXDO VHWXS RI RXU (XURSHDQ VRFLHW\ LPSRVHV VRFLHWDO FKDOOHQJHV RQ SROLWLFDO HFRQRPLF DQGVRFLDO LQWHJUDWLRQ DQG LQFOXVLRQ HVSHFLDOO\ LQ WKH FUHDWLRQ RI WKH VLQJOH GLJLWDO PDUNHW DQG XQLILHG LQIRUPDWLRQ VSDFH WDUJHWHG E\ WKH 'LJLWDO $JHQGD (& /DQJXDJH WHFKQRORJ\ LV WKH PLVVLQJ SLHFH RI WKH SX]]OHLW LV WKH NH\ HQDEOHU DQG VROXWLRQ WR ERRVWLQJ JURZWK DQGVWUHQJWKHQLQJ (XURSH¶V FRPSHWLWLYHQHVV5HFRJQLVLQJ (XURSH¶V H[FHSWLRQDO GHPDQG DQG RSSRUWXQLWLHV OHDGLQJ UHVHDUFK FHQWUHV LQ (XURSHDQ FRXQWULHV MRLQHG IRUFHV LQ 0(7$1(7 D 1HWZRUN RI ([FHOOHQFH GHGLFDWHG WR WKH WHFKQRORJLFDO IRXQGDWLRQV RID PXOWLOLQJXDO (XURSHDQ LQIRUPDWLRQ VRFLHW\ 0(7$1(7 ZDV SDUWLDOO\ VXSSRUWHG WKURXJK IRXU SURMHFWVIXQGHG E\ WKH (& 70( &(6$5 0(7$1(78 DQG0(7$125' 0(7$1(7 LV IRUJLQJ WKH 0XOWLOLQJXDO (XURSH 7HFKQRORJ\ $OOLDQFH 0(7$ ZLWK PRUHWKDQ RUJDQLVDWLRQV DQG H[SHUWV UHSUHVHQWLQJ PXOWLSOH VWDNHKROGHUV DQG VLJQHG FROODERUDWLRQ DJUHHPHQWVZLWK PRUH WKDQ RWKHU SURMHFWV DQG LQLWLDWLYHV 0(7$1(7¶V JRDO LV PRQROLQJXDO FURVVOLQJXDO DQG PXOWLOLQJXDO WHFKQRORJ\ VXSSRUW IRU DOO (XURSHDQ ODQJXDJHV5HKP DQG 8V]NRUHLW :H UHFRPPHQG IRFXVLQJRQ WKUHH SULRULW\ UHVHDUFK WKHPHV FRQQHFWHG WR DSSOLFDWLRQ VFHQDULRV WKDW ZLOO SURYLGH (XURSHDQ 5'ZLWK WKHDELOLW\ WR FRPSHWH ZLWK RWKHU PDUNHWV DQG DFKLHYH EHQHILWV IRU (XURSHDQ VRFLHW\ DQG FLWL]HQV DV ZHOO DV RSSRUWXQLWLHV IRU RXU HFRQRP\ DQG IXWXUH JURZWK

7KLV SDSHU H[WHQGV DQG XSGDWHV RQH LPSRUWDQW UHVXOW RIWKH ZRUN FDUULHG RXW ZLWKLQ WKH 0(7$9,6,21 SLOODURI WKH LQLWLDWLYH WKH FURVVODQJXDJH FRPSDULVRQ RI /7VXSSRUW IRU (XURSHDQ ODQJXDJHV DV SXEOLVKHG LQ WKH0(7$1(7 /DQJXDJH :KLWH 3DSHU 6HULHV 5HKP DQG8V]NRUHLW

7KH /DQJXDJH :KLWH 3DSHU 6HULHV$QVZHULQJ WKH TXHVWLRQ RQ WKH FXUUHQW VWDWH RI D ZKROH5' ILHOG LV GLIILFXOW DQG FRPSOH[ )RU /7 QRERG\ KDGFROOHFWHG WKHVH LQGLFDWRUV DQG SURYLGHG FRPSDUDEOH UHSRUWV IRU D VXEVWDQWLDO QXPEHU RI (XURSHDQ ODQJXDJHV\HW 7R DUULYH DW D ILUVW FRPSUHKHQVLYH DQVZHU 0(7$1(7 SUHSDUHG WKH /DQJXDJH :KLWH 3DSHU 6HULHV ³(XURSH¶V /DQJXDJHV LQ WKH'LJLWDO $JH´ 5HKP DQG8V]NRUHLW WKDW GHVFULEHV WKH FXUUHQW VWDWH RI /7 VXSSRUWIRU (XURSHDQ ODQJXDJHV LQFOXGLQJ DOO RIILFLDO (8ODQJXDJHV 7KLV XQGHUWDNLQJ KDG EHHQ LQ SUHSDUDWLRQZLWK PRUH WKDQ H[SHUWV VLQFH PLG DQG ZDVSXEOLVKHG LQ WKH VXPPHU RI 7KH VWXG\ LQFOXGHG DFRPSDULVRQ RI WKH VXSSRUW DOO ODQJXDJHV UHFHLYH LQ IRXUDUHDV 07 VSHHFK WH[W DQDO\WLFV ODQJXDJH UHVRXUFHV7KH GLIIHUHQFHV LQ WHFKQRORJ\ VXSSRUW EHWZHHQ WKH YDULRXV ODQJXDJHV DQG DUHDV DUH GUDPDWLF DQG DODUPLQJ ,QWKH IRXU DUHDV (QJOLVK LV DKHDG RI WKH RWKHU ODQJXDJHVEXW HYHQ VXSSRUW IRU (QJOLVK LV IDU IURP EHLQJ SHUIHFW:KLOH WKHUH DUH JRRG TXDOLW\ VRIWZDUH DQG UHVRXUFHVDYDLODEOH IRU D IHZ ODUJHU ODQJXDJHV DQG DSSOLFDWLRQ DUHDV RWKHUV XVXDOO\ VPDOOHU ODQJXDJHV KDYH VXEVWDQWLDOJDSV 0DQ\ ODQJXDJHV ODFN EDVLF WHFKQRORJLHV IRU WH[W

Page 15: Multilingualism for Digital Europe

MT

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support

Albanian, Asturian, Basque, Bosnian, Breton, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Frisian, Friulian, Galician, Greek,

Hebrew, Icelandic, Irish, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Maltese, Norwegian, Occitan,

Portuguese, Romany, Scots, Serbian, Slovak, Slovene, Swedish, Turkish, Vlax Romani, Welsh, Yiddish

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian,

Galician, Greek, Hungarian, Irish,

Norwegian, Polish, Serbian, Slovak, Slovene,

Swedish, Turkish

weak or no support

Albanian, Asturian, Bosnian, Breton, Croatian, Frisian,Friulian, Hebrew, Icelandic, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Maltese, Occitan, Romanian,

Romany, Scots, Vlax Romani, Welsh, Yiddish

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Hebrew,

Italian, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian,

Polish, Portuguese, Romanian, Slovak, Slovene, Swedish

weak or no support

Albanian, Asturian, Bosnian, Breton, Croatian, Estonian, Frisian, Friulian, Icelandic, Irish, Latvian, Limburgish, Lithuanian,

Luxembourgish, Macedonian, Maltese, Occitan, Romany, Scots, Serbian, Turkish, Vlax Romani, Welsh, Yiddish

excellent

English

good

Czech, Dutch, French, German,

Hungarian, Italian, Polish,

Spanish, Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish,

Estonian, Finnish, Galician, Greek, Hebrew, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

Albanian, Asturian, Bosnian, Breton, Frisian, Friulian, Icelandic, Irish, Latvian, Limburgish, Lithuanian, Luxembourgish,

Macedonian, Maltese, Occitan, Romany, Scots, Turkish, VlaxRomani, Welsh, Yiddish

weak/no supportexcellent

Res

ourc

esTe

xt A

naly

tics

Page 16: Multilingualism for Digital Europe

Excellent

Good

Moderate

Fragmentary

Weak/nosupport

Lang

uage

Tech

nolo

gy Su

ppor

tM

illions of Native Speakers (Worldwide)

Yiddis

h

Welsh

Vlax R

oman

i

Turki

sh

Scot

s

Roma

ny

Occit

an

Malte

se

Mace

donia

n

Luxe

mbou

rgish

Lithu

anian

Limbu

rgish

Latvi

an

Icelan

dicFri

ulian

Frisia

n

Breto

n

Bosn

ian

Astu

rian

Alban

ian Irish

Croati

an

Serb

ian

Hebr

ew

Esto

nian

Slove

ne

Slova

k

Romanian

Norw

egian

Gree

k

Galic

ian

Danis

hBu

lgaria

n

Basq

ue

Swed

ish

Portu

gues

e

Finnis

h

Catalan

Polish

Hungarian

Czec

h

Italian

German

Dutch

Spanish

French

English

0

50

100

150

200

250

300

350

400

Extension of the META-­NET White Paper Study (2013/2014)

Page 17: Multilingualism for Digital Europe

META-NETStrategic Research

Agenda (SRA)

http://www.meta-net.eu 17

Page 18: Multilingualism for Digital Europe

Three Ingredients

http://www.meta-net.eu 18

AppropriateProgrammeVision & Agenda

Appropriate ActorsResearch &

Commercialisation

Appropriate Support

Funding

Page 19: Multilingualism for Digital Europe

Vision Paper

Vision Group Translation and

LocalisationReport

Vision Group Interactive

Systems Report

Vision Group Media and

Information Services Report

PriorityThemesPaper

Expert meetingminutes

Expert meetingminutes

Expert meetingminutes

Planning Process

Strategic Research Agenda

2010 2011 2012

Page 20: Multilingualism for Digital Europe

Vision Paper

Vision Group Translation and

LocalisationReport

Vision Group Interactive

Systems Report

Vision Group Media and

Information Services Report

PriorityThemesPaper

Expert meetingminutes

Expert meetingminutes

Expert meetingminutes

Planning Process: Documents

Strategic Research Agenda

2010 2011 2012

www.meta-net.eu [email protected] T: +49 30 23895 1833

The Future European Multilingual Information Society

Vision Paper for a Strategic Research Agenda

“People can’t share knowledge if they don’t speak a common language.” Davenport, Thomas H, and Laurence Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School, Boston, 1997, p. 98.

Join the discussion at www.meta-et.eu/forum

LT 2020 Vision and Priority Themes for Language Technology Research in Europe until the Year 2020 Towards the META-NET Strategic Research Agenda

The development of this paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the Euro-pean Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).

Do you have comments, ideas or suggestions

with regard to the content of this document?

Please send them to [email protected] or

discuss them online: http://www.meta-net.eu/sra.

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Translation and Localisation Results of first two meetings

Editors: Aljoscha Burchardt, Georg Rehm

Dissemination Level: Public

Date: 3 December 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Media and Information Services: Results of first two meetings

Editors: Maria Koutsombogera, Stelios Piperidis

Dissemination Level: Public

Date: 10 November 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Interactive Systems: Results of first two meetings

Editors: Joseph Mariani, Bernardo Magnini

Dissemination Level: Public

Date: 28 December 2010

Page 21: Multilingualism for Digital Europe

Strategic Research Agenda

q Addresses the problems we identified when preparing the white papers.

q Can put Europe ahead of its competitors in this technology area.

q 200 contributors; >2 years.54% industry; 46% research; 4% (inter)national institutions.

q Presented and discussed at 90+ conferences and major workshops.

q Published in early 2013.

q http://www.meta-net.eu/sra

http://www.meta-net.eu 21

Page 22: Multilingualism for Digital Europe

Priority Research Themes

q Three priority research themes:§ Translingual Cloud§ Social Intelligence and

e-Participation§ Socially-Aware Interactive

Assistants

q Two additional themes:§ European Service Platform

for Language Technologies§ Core Technologies for

Language Analysis and Production

http://www.meta-net.eu 22

Page 23: Multilingualism for Digital Europe

Providers of operational and research technologies and services

ResearchCentres

EuropeanInstitutions

Othercompanies (SMEs,

startups etc.)

NationalLanguageInstitutions

LanguageTechnologyProviders

LanguageService

ProvidersUniversities

EuropeanInstitutions

ResearchCentres

Public Administrations Enterprises LT User

Industries UniversitiesEuropeanCitizens

Beneficiaries/users of the platform

Interfaces (web, speech, mobile etc.)

Priority Research Theme 1:Translingual

Cloud

Priority Research Theme 2:Social Intelligence& e-Participation

Priority Research Theme 3:Socially Aware

Interactive Assistants

European Service Platform for Language Technologies(Cloud or Sky Computing Platform)

Multilingualtechnologies

Textanalytics

Textgeneration

Languagechecking

Sentimentanalysis

Named entityrecognition

Summari-sation

Knowledge accessand management

Information andrelation extraction

LanguageProcessing

LanguageUnderstanding

Knowledge

Emotion/Sentiment

Data protectionToolsData SetsResourcesComponentsMetadataStandardsInterfacesAPIsCataloguesQuality AssuranceData Import/ExportInput/OutputStoragePerformanceAvailabilityScalability

Featu

res

Page 24: Multilingualism for Digital Europe

Icelandic

French

CatalanItalian

Maltese

Greek

Bulgarian

Romanian

Serbian

Croatian

Slovene Hungarian

Slovak

Czech

German

Danish Lithuanian

Latvian

Estonian

Finnish

Swedish

Norwegian

Basque

SpanishPortuguese

Galician

English

Irish

PolishDutch

Polish

English

Irish

Icelandic

Italian

Maltese

Greek

Bulgarian

Romanian

SerbianCroatian

SloveneHungarian

Slovak

Czech

German

Dutch

DanishLithuanian

Latvian

Estonian

Finnish

Swedish

Norwegian

Basque

Spanish

Portuguese

Galician

French

Catalan

http://www.meta-net.eu 24

Concrete result of these activities: One call for proposals around Machine Translation in Horizon 2020 WP 2015-­17.

Page 25: Multilingualism for Digital Europe

CRACKER

http://www.meta-net.eu 25

Page 26: Multilingualism for Digital Europe

1 DFKI Germany Georg Rehm2 CUNI Czech Republic Jan Hajic3 ELDA France Khalid Choukri4 FBK Italy Marcello Federico5 ATHENA RC Greece Stelios Piperidis6 UEDIN UK Philipp Koehn7 USFD UK Lucia Specia

Coordination and Support Action, H2020-ICT17, 2015–2017, 36 months – http://www.cracker-project.eu

Cracking the Language BarrierCoordination, Evaluation and Resources for European MT Research

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

Communities• META-NET incl. META-SHARE and META• MT evaluation initiatives – WMT, IWSLT, MT Marathons• MT and other LT industry• Language resources – META-SHARE, ELRA• HT/MT evaluation tools – translate5 • Translation industry, translation profession• MT user communities

Strategic Agenda for the Multilingual Digital Single Market• Version 0.5 presented at META-FORUM 2015 (Riga)• Version 0.9 presented at META-FORUM 2016 (Lisbon)

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

Page 27: Multilingualism for Digital Europe

Selected Activities

2015 2016 2017M

12M1

M24

M36

Kick-off meetingfor all ICT-17Projects

translate5 WMT2016

WMT2017

IWSLT2015

IWSLT2016

IWSLT2017

QT Marathon2015

QT Marathon2016

Roadmap forEuropean MT

Research

Survey on the Stateof HQMT in Industry

and LSPs

SRIA(initial version)

SRIA(update)

SRIA(final)

version 2version 1

• Production of resources (e.g., for WMT 2016 and 2017, IWSLT 2015-­2017)

• Tools (quality control, evaluations)• Strategies and roadmaps (SRIA, Roadmap for European MT Research)

• Exchange and sharing facility for resources (META-­SHARE)

Recent or Upcoming Events

• LREC Workshop on MT Eval. (May 25)• META-FORUM 2016 (July 4/5, Lisbon)• WMT 2016 (Aug. 11/12, Berlin)• IWSLT 2016 (Dec. 8/9, Seattle)

• Federation of organisations and projects working on technologies for multilingual Europe.

• 10 organisations; 24 projects.• Areas of collaboration: data

management and repositories, tools, shared tasks, evaluations.

• Goal: provide one umbrella organisation for the whole community.

http://www.cracking-the-language-barrier.eu

Page 28: Multilingualism for Digital Europe

q META-FORUM 2016 – July 04/05, Lisbon, PortugalBeyond Multilingual Europe

q META-FORUM 2015 – April 27, Riga, LatviaTechnologies for the Multilingual Digital Single Market

q META-FORUM 2013 – Sept. 19/20, Berlin, GermanyConnecting Europe for New Horizons

q META-FORUM 2012 – June 20/21, Brussels, BelgiumA Strategy for Multilingual Europe

q META-FORUM 2011 – June 27/28, Budapest, HungarySolutions for Multilingual Europe

q META-FORUM 2010 – Nov. 17/18, Brussels, BelgiumChallenges for Multilingual Europe

http://www.meta-net.eu 28

Page 29: Multilingualism for Digital Europe

The Multilingual Digital Single Market

http://www.meta-net.eu 29

Page 30: Multilingualism for Digital Europe

q Top priority in the European Union.

q Expected to add 400b€ to European GDP and hundreds of thousands of new jobs.

q Unfortunately, the language topic is not included in the EC’s Digital Single Market strategy (published in May 2015).

Page 31: Multilingualism for Digital Europe
Page 32: Multilingualism for Digital Europe
Page 33: Multilingualism for Digital Europe

http://www.meta-net.eu 33

Page 34: Multilingualism for Digital Europe

Facts and Figures

http://www.meta-net.eu 34

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

Page 35: Multilingualism for Digital Europe

Facts and Figures

http://www.meta-net.eu 35

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

Page 36: Multilingualism for Digital Europe

The MDSM Fact Sheet

http://www.meta-net.eu 36

Current eCommerce growth within Europe is about half that of the US,due partially to a lack of language coverage from European SMEs.

Less than 5% of European SMEs currently sell cross-language.

Multilingual Digital Single MarketWhy Europe needs a

No single language accounts for more than 20% of the potential Multilingual Digital Single Market.

Most account for less than 3% of the DSM.

Without a solution, the European Digital Single Market will remain fragmented.

Europe’s 24 officiallanguages presenta tremendousopportunity forEuropean business

Removing language barriers within Europe would open access to 73% (with >€25 trillion in annual revenue!) of the world’s digitally accessible market to European enterprise.

Europe today is not a single market:it is a separated into 20+ small language markets.

www.meta-net.eu

Chinese(510 million) World

Span

ish

(165 million)

World Po

rtugu

ese

(83 million)

English(565 million)

Japan

ese

(100 million)

Russia

n

(60 million)

Europe today(Many small

markets)

LANGUAGE TECHNOLOGY

The Multilingual Digital Single Market

Online Population

Sour

ce: In

terne

t Wor

ld Sta

ts (M

iniwa

tt Ma

rketin

g Grou

p)Int

ernet

World

Stats

(Mini

THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET

Multilingual access to all digital goods and services across Europe1

Geo-blocking:

due to nationality, location, or residence

customers

Language-blocking:

languages they do not speak

however, current online translation is insufficienttrying to conduct

common languages

Geo-blocking and language-blocking are barriers to access

Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.

Customers are six times more likely to buy from sites in their native language.

Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.

Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in

European businesses.

Language can be expensive for SMEs

Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar

and marketing costs.

Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.

English is not the answer52% of EU customers do not purchase

Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today

to increase market share.

6x morelikely to

purchase

Site in buyer’snative language

Site in foreignlanguage

Likel

ihoo

d of p

urch

asin

g

Good

Moderate

Fragmentary

Weak/nosupport

0

50

100

150

200

250

300

350

400

Lang

uage

Tech

nolo

gy Su

ppor

t* Millions of Native Speakers (W

orldwide)

Language Technology Danger Zone(≈150 million EU citizens)

Language Technology Danger Zone(≈150 million EU citizens)

Span

ishEn

glish

Portu

gues

eGe

rman

Fren

chIta

lian

Polis

hRo

man

ian

Dutch

Gree

kHu

ngar

ian

Czec

hSw

edish

Bulg

aria

nDa

nish

Croa

tian

Slova

kFin

nish

Lithu

ania

nSlo

vene

Latv

ian

Esto

nian

Mal

tese

Irish

140 million EU citizens are in the Language Technology Danger Zone, where language technology is inadequate to support the DSM.

Current online automatic translation provided by US tech giants does not solve

less than 30% of automatically translated content is truly useful for online commerce.

Only three European languages

Boosting commerce through multilingual technologies2

Connecting citizens to European digital public services3

Without Language Technology, the European Commission has no way to respond effectively to citizen participation.

Current language technology is inadequate for over half of the EU official languages to help the European Commission solve its citizen engagement problem.

Translation opens 20 times its cost in revenue opportunity. However, translation remains too expensive for many European SMEs, blocking this opportunity and limiting economic growth in Europe. Lowering these costs is a strategic opportunity

TranslationCosts

Increase inRevenue

good

bad

ugly

Online AutomaticTranslation Quality

Most local governmental services are monolingual only. This poses a problem for tourists, expatriates, and linguistic minorities. Language technology can provide the

Multilingual eParticipation can help build the European Identity

with one another in their respective native languages with sophisticated machine translation working behind the scenes. Only when EU citizens can interact in their own languages will they truly develop a sense of European identity and community.

Over half of EU citizens are language blocked from interacting with the European Commission’s web resources for citizen participation.

290 million EU citizens excluded Speakers of otherlanguages are

languageblocked from

full participation

Speakers ofEnglish, French,

German canparticipate

fully

Strategic Agenda for the Multilingual Digital Single Market http://rigasummit2015.eu. META, the Multilingual Europe Technology Alliance, has more than 750 members (http://www.meta-net.euLT-Innovate, the European Association of the Language Technology Industry, has 180 corporate members throughout Europe (http://lt-innovate.eu

Technology support has improved for some languages since this study was completed.

Technology Solutions

Investment in the following solutions will help achieve theMultilingual Digital Single Market

Unified Customer Experience

care, customer relationship, discussion fora,

Multimodal User Experience for Connected Devices

interfaces

household appliances, and consumer

Voice of the Customer

market research

Content Curation and Production

Digital Translation Centre

customers, citizens

The forthcoming Strategic Agenda for the Multilingual Digital Single Market will provide additional details on these and other solutions for the needs of the Multilingual Digital Single Market.

Download this fact sheet from http://cracker-project.eu.For more information contact Dr. Georg Rehm (DFKI) at [email protected].

http://cracker-project.eu/wp-content/uploads/2015/11/mDSM-Fact-Sheet.pdf

Page 37: Multilingualism for Digital Europe

META-FORUM 2015 AND MDSM SRIA V0.5

http://www.meta-net.eu 37

Page 38: Multilingualism for Digital Europe

Open Letter to the EC

q On Friday, March 20, 2015, we published an open letter to the EC on http://multilingualeurope.eu.

q On Monday, March 23, 2015, we informed President Juncker and all Commissioners about the campaign and the 1300+ signatures.

q By now more than 3600 signatures!

38

q 5 Members of the European Parliament

q 150+ high-level representatives from industry (CxO level)

q 1200+ professorsq 400+ project or research managersq 20+ entrepreneurs and foundersq hundreds of language and language

technology professionals, officials, researchers, administrators and representatives from related stakeholder groups

Who signed?

Page 39: Multilingualism for Digital Europe

META-FORUM 2015

q April 27 in Riga, Latviaq Riga Summit 2015 on the Multi-

lingual Digital Single Marketq Two important components:

§ MDSM SRIA Version 0.5

§ Further community fusing

q http://www.meta-forum.eu

Page 40: Multilingualism for Digital Europe

Joint EFNIL and NPLD Panel

q Joint EFNIL and NPLD panel at META-FORUM 2015.q Joint position paper.

Initially presented at META-FORUM 2015 and the Riga Summit 2015 on the Multilingual Digital Single Market, April 27, 2015

www.rigasummit2015.eu

Joint NPLD/EFNIL Position Paper on the

Multilingual Digital Single Market

!

“Languages are not only a means of communication. They also have embedded in them people’s values, aspirations and hopes.” (European Roadmap for Linguistic Diversity 2015, NPLD)

“Many European languages run the risk of becoming victims of the digital age as they are un-der-represented and under-resourced online. Huge regional market opportunities remain un-tapped because of language barriers.” (Multilingual Europe: A challenge for language tech. MultiLingual. April/May 2011, page 51/52)

Page 41: Multilingualism for Digital Europe

Vision Paper

Vision Group Translation and

Localisation Report

Vision Group Interactive Systems

Report

Vision Group Media and Information Services Report

PriorityThemesPaper

Expert meetingminutes

Expert meetingminutes

Expert meetingminutes

META-NET Strategic Research Agenda for

Multilingual Europe 2020

2010 2011 2012 2013 2014 2015

www.meta-net.eu [email protected] T: +49 30 23895 1833

The Future European Multilingual Information Society

Vision Paper for a Strategic Research Agenda

“People can’t share knowledge if they don’t speak a common language.” Davenport, Thomas H, and Laurence Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School, Boston, 1997, p. 98.

Join the discussion at www.meta-et.eu/forum

LT 2020 Vision and Priority Themes for Language Technology Research in Europe until the Year 2020 Towards the META-NET Strategic Research Agenda

The development of this paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the Euro-pean Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).

Do you have comments, ideas or suggestions

with regard to the content of this document?

Please send them to [email protected] or

discuss them online: http://www.meta-net.eu/sra.

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Translation and Localisation Results of first two meetings

Editors: Aljoscha Burchardt, Georg Rehm

Dissemination Level: Public

Date: 3 December 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Media and Information Services: Results of first two meetings

Editors: Maria Koutsombogera, Stelios Piperidis

Dissemination Level: Public

Date: 10 November 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Interactive Systems: Results of first two meetings

Editors: Joseph Mariani, Bernardo Magnini

Dissemination Level: Public

Date: 28 December 2010

Strategic Research and

Innovation Agenda

roadmaps, agendas and any other input from other initiatives

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFTStrategic Agenda for the

Multilingual Digital Single Market

Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market

DRAFT

Version 0.5 – April 22, 2015

Page 42: Multilingualism for Digital Europe

Strategic Agenda for MDSM

q Presented at META-FORUM 2015 and Riga Summit for the first time.

q Version 0.5 – work in progress

q Builds upon many strategy papers and roadmaps prepared by several European projects, incl. the META-NET SRA (2013).

q Input and feedback collected at theRiga Summit 2015 to be used for upcoming versions.

http://www.meta-net.eu

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFTStrategic Agenda for the

Multilingual Digital Single Market

Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market

DRAFT

Version 0.5 – April 22, 2015

Page 43: Multilingualism for Digital Europe

A Strategy for the MDSM

q Strategic R&I Agenda for the Multilingual Digital Single Market

q Core: Technology Solutions q Data economy is an inherent

component – LT for effective multilingual data value chains.

http://www.meta-net.eu 43

Page 44: Multilingualism for Digital Europe

ii Strategic Agenda for the Multilingual Digital Single Market – Version 0.5 – April, 2015

ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i1 The Digital Single Market is a Multilingual Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Overcoming Language Barriers with Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Language Technologies Made for Europe – in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Online Use of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Multilingual Big Data Text Analytics for the European Data Economy . . . . . . . . . . . . . . . . . . . . . 61.5 EC and Language Technology – Past and Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 The Economic Power of Language Technology and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 A Strategic Programme for the Multilingual Digital Single Market . . . . . . . . . . . . . . . . . . . . . . . 102.1 Layer 1: Innovative Technology Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 102.3 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Related Areas, Applications, and Societal Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Layer 1: Innovative Technology Solutions for the Multilingual Digital Single Market . . . . . . . 183.1 Technology Solutions for Businesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Unified Customer Experience and Cross-Cultural CRM (E-Commerce) . . . . . . . . . . . . . . 183.1.2 Digital Translation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Content Curation and Content Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.4 Virtual and Real Translingual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.5 Voice of the Customer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.6 Business Intelligence using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.7 Multimodal User Experience for Connected Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.8 Smart Multilingual Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Technology Solutions for Public Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Voice of the Citizen – Social Intelligence on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Online Dispute Resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.3 E-Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 E-Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.5 E-Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6 E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 295 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Horizontal Framework Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1 Language Policies and Public Procurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Standards and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Open Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.4 Copyright and Data Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.1 Expected Economic Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.2 Relevance to the EC’s Digital Single Market Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.3 Potential Funding Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Appendix A. Input Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Appendix B. Digital Language Extinction in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Page 45: Multilingualism for Digital Europe

q Letter from Andrus Ansip (June 2015)

q “We invite the European language technology community to further develop the ideas presented in the draft Strategic Agenda for the multilingual Digital Single Market”

Page 46: Multilingualism for Digital Europe

Cracking the Language Barrier

http://www.meta-net.eu 46

Page 47: Multilingualism for Digital Europe

Riga Declarationq 12 organisations present at

META-FORUM 2015 and the Riga Summit 2015 drafted and signed the “Declaration of Common Interests”.

q CRACKER: community building, mostly among projects.

q We combined these into the Cracking the Language Barrier federation.

q Important goal: measure against community fragmentation.

http://www.meta-net.eu

DECLARATION OF COMMON INTERESTS We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we stand united in our goal and interest to:

- support multilingualism in Europe by employing language technology in business, society and governance, to create a truly Multilingual Digital Single Market,

- exchange and share information in our efforts to promote our goals and interests at local, national and European levels,

- raise awareness in society at large using channels available to our associations, alliances and societies.

In the near future, we foresee the establishment of a Memorandum of Understanding among our organisations towards a “Coalition for a Multilingual Europe”, to better serve our members address the language barrier challenges towards establishing a truly integrated Multilingual Digital Single Market.

Riga, 29. April 2015

Signed by (in alphabetical order):

BDVA Laure Le Bars

CITIA Steve Renals

CLARIN Steven Krauwer

EFNIL Sabine Kirchmeier-Andersen, Tamás Váradi

ELEN Davyth Hicks, Claudia Soria

ELRA Nicoletta Calzolari, Khalid Choukri

GALA Laura Brandon, Robert E. Etches, Sergey Gladkov

LT Innovate Jochen Hummel, Philippe Wacker

META-NET Jan Hajic, Josef van Genabith, Georg Rehm, Andrejs Vasiljevs

NPLD Meirion Prys Jones

TAUS Jaap van der Meer W3C Richard Ishida, Felix Sasaki

For any questions, please contact [email protected].

Page 48: Multilingualism for Digital Europe

http://www.cracker-project.eu • http://www.meta-net.eu

• A federation of European projects and organisations working on technologies for a multilingual Europe.

• Multi-lateral Memorandum of Understanding; 10 organisations and 24 projects on board already (including FP7 and H2020-ICT15).

• Getting new members on a regular basis.• Selected areas of collaboration: data

management and repositories, tools, shared tasks, evaluations, events.

• Goal: provide one umbrella organisation for the whole community.

Page 49: Multilingualism for Digital Europe

Project Members

Organisation Members

Page 50: Multilingualism for Digital Europe

http://www.cracker-project.eu • http://www.meta-net.eu

• Website: information about the initia-tive, all projects and organisations

• Downloadable documents

• List of events

• LREC 2016 MT Eval Workshop

• Several new members will join the initiative soonhttp://www.cracking-the-language-barrier.eu

Page 51: Multilingualism for Digital Europe

META-FORUM 2016 AND MDSM SRIA V0.9

http://www.meta-net.eu 51

Page 52: Multilingualism for Digital Europe

Andrus Ansip’s Blog Post

q Posted on 27 May 2016. q First public acknowledgment

of the EC that the language topic is of very high relevance for the Digital Single Market.

q “Overcoming language barriers is vital for building the DSM, which is by definition multilingual. It is now time to reduce and remove the language barriers that are holding back its advance, and turn them into competitive advantages.”

http://www.meta-net.eu 52

Page 53: Multilingualism for Digital Europe

Reorganisation of DG CONNECT (01/07/2016)

01/07/2016

DG CONNECTCommunications Networks,Content & Technology

Director-GeneralR. Viola (60240

AssistantsO. Bringer (92067P. Stuckmann (21097

Deputy Director-Generalin charge of DirectoratesA, C, E & HG. Kent (acting) (91945

AssistantE. Mitjana (81149

Deputy Director-Generalin charge of DirectoratesB, D, F, G & IC. Bury (60499

AssistantP. Lamotte (98892

Directorate FDigital Single Market

G. de Graaf(68466

Directorate EFuture Networks

M. Campolargo(63479

Directorate DPolicy Strategy& OutreachL. Corugedo Steneberg (96383

Directorate CDigital Excellence& Science Infrastructure

Th. Skordas (acting)(68908

Directorate BElectronic Communications Networks & ServicesA. Whelan (50941

Directorate ADigital Industry

K. Rouhana(68057

Principal AdviserF. Lupescu(68538

Directorate RResources& SupportG. Kent(91945

Directorate IMedia Policy

G. Abbamonte(93573

Directorate HDigital Society, Trust& CybersecurityP. Timmers(90245

Directorate GData

J. Hernández-Ros (acting) (34533

F.1: Digital Policy Development & CoordinationM. Bailey (acting)(69176

E.1: Future Connectivity SystemsB. Barani (acting)(69616

D.1: Research Strategy & Programme CoordinationM. Fjalland (50021

C.1: eInfrastructure & Science Cloud

A. Burgueño Arjona (92471

B.1: Electronic Communications PolicyV. Terävä(92381

A.1: Robotics& Artificial IntelligenceJ. Heikkilä(35325

R.1: Human Resources & CompetencesI. Mariën-Dusak(92376

I.1: Audiovisual & Media Services PolicyL. Boix Alonso(90009

H.1: Cybersecurity & Digital Privacy

J. Boratynski(69452

G.1: Data Policy & Innovation

M. Nagy-Rothengass(31680

F.2: E-Commerce & Platforms

P. Agarwal (acting)(87153

E.2: Cloud & Software

P. O’Donohue(91280

D.2: Policy Implementation & PlanningE. Forti(65172

C.2: High Performance Computing & Quantum TechnologyG. Kalbe(32866

B.2: Implementation of the Regulatory FrameworkW-D. Grussmann(58559

A.2: Technologies & Systems for Digitising IndustryM. Lemke(91575

R.2: Budget & Finance

M-C. Laffineur(68515

I.2: Copyright

M. Martin-Prat(65157

H.2: Smart Mobility & LivingE. Hartog(90084

G.2: Data Applications & CreativityJ. Hernández-Ros(34533

F.3: Start-ups & InnovationP. Zilgalvis(50935

E.3: Next-Generation InternetJ. Villasante(63521

D.3: Policy Outreach & International AffairsA. Angelova-Krasteva(91145

C.3: Future & Emerging Technologies (FET)V. Peca(57843

B.3: Markets

R. Krüger(61555

A.3: Competitive Electronics IndustryW. Van Puymbroeck(68138

R.3: Knowledge Management & Support SystemsF. Accordino(98272

I.3: Audiovisual Industry & Media ProgrammeL. Recalde Langarica(91281

H.3: E-Health, Well-Being & AgeingM. González-Sancho (52918

G.3: Learning, Multilingualism & AccessibilityM. Marsella (acting)(32750

F.4: Digital Economy & SkillsL. Sioli(51262

E.4: Internet of ThingsM. Rohen(63674

D.4: Communication

D. Ringrose(93913

C.4: Flagships

Th. Skordas(68908

B.4: Radio Spectrum PolicyA. Geiss(59466

A.4: Photonics

C. Maloney(69082

R.4: Compliance & Planning

K. Engelbosch(54693

I.4: Media Convergence & Social MediaJ. Cotta(66407

H.4: E-Government & Trust

A. Servida(58186

G.4: Administration& FinanceG. Kalbe (acting)(32866

A.5: Administration& Finance *A. Fiala(64787

B.5: Investment in High-Capacity NetworksA. Krzyżanowska(87246

H.5: Administration& Finance **G. Van Caenegem (acting) (61895

R.5: Programme Operations & Common ServicesI. Malekos(52902

Mirror-Unit REA.A.5Fostering Novel Ideas: FET-OpenT. Hallantie(68167

Mirror-Unit EACEA.B.2Creative Europe: MEDIAH. Trettenbrein(84955

Mirror-Unit REA.C.4Expert Contracting& PaymentsA. Oram(97805

Principal AdviserM. Richards (62443

Adviser for Legal& Legislative IssuesŽ. Bahovec (88284

Adviser for cross-cutting Policy/Research IssuesG. Santucci (68963

Adviser for International Relations linked to Future NetworksP. Blixt (68048

Adviser for Societal IssuesN. Dewandre (94925

Adviser for Organisational Transition (Finance)Vacant

Adviser for Societal ChallengesVacant

Adviser for Innovation SystemsB. Salmelin (69564

Reporting lines are:- R. Viola for Directorate R;- G. Kent (acting) for Directorates A, C, E, H;- C. Bury for Directorates B, D, F, G, I.

Luxembourg;

To be transferred to Luxembourg.

Shared Administration & Finance Unit for Directorates A, B, C, D & F.

Shared Administration & Finance Unit for Directorates E, H & I.

Unit G.1 “Data Policy & Innovation”

Unit G.3 “Learning, Multilingualism & Accessibility”

• Support the data economy in the Digital Single Market• Policy initiatives addressing new and emerging issues. • Advance the Commission open data policy by ensuring the

correct implementation of the PSI Directive and the Pan-European Open Data Portal

• Promote the emergence of an ecosystem comprising all the players of the data value chain.

• Steers together with industry the SRIA. • Addresses key framework conditions of the data economy• Fund research and innovation in data technologies and

applications inter alia by driving the big data PPP.

• Make the DSM more accessible, secure and inclusive. • Support policy, research, innovation and deployment of learning

technologies • Support key enabling digital language technologies and

services to allow all European consumers and businesses to fully benefit from the Digital Single Market.

• Responsible for Web Accessibility Directive• Promote a better Internet for children by protecting and

empowering children online, and improving the quality of content available to them.

Page 54: Multilingualism for Digital Europe

Communities & Stakeholders

54... and many more research centres, companies, EU projects etc.

Page 55: Multilingualism for Digital Europe
Page 56: Multilingualism for Digital Europe

MDSM SRIA

q Version 0.5 unveiled at META-FORUM 2015q Version 0.9 unveiled at META-FORUM 2016q Version 1.0 foreseen for Nov./Dec. 2016q Prepared and presented by Cracking the Language

Barrier federation (editorial team: 13 colleagues)q SRIA addresses how the LT community is going

to act united in order to make the DSM multilingualq Document available on http://www.cracker-project.eu

and also on http://www.cracking-the-language-barrier.eu

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFT

DRAFTStrategic Agenda for the

Multilingual Digital Single Market

Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market

DRAFT

Version 0.5 – April 22, 2015

Page 57: Multilingualism for Digital Europe

MLV Programme

q Multilingual Value Programe*§ Three-year programme§ Requires modest investment

q “Enabling the Multilingual Digital SingleMarket through technologies fortranslating, analysing, processing andcurating natural language content”

q Three components address the main needs of the Multilingual DSM (MDSM)and how to put them into practice:1. Multilingual Application Areas2. Multilingual Services3. Research

http://www.meta-net.eu 57

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

* SRIA V0.9 and MLV Programme devisedbefore re-organisation of DG CONNECT.

Page 58: Multilingualism for Digital Europe

MDSM: Goals and Needs

q Crosslingual communication for SMEs, public institutions, citizensq Crosslingual SME presales communication and aftersales servicesq Multilingual (big) data, language and knowledge value chainsq Multilingual websites, product catalogues, product descriptionsq Multilingual knowledge bases and knowledge graphs (and services)q Multilingual conversational interfaces for connected devices (IoT)q Crosslingual business intelligence (e.g., based on UGC)q Crosslingual social media analytics for EU-wide societal issuesq Multilingual text and report generation (knowledge/data to text)q All services must be domain-adaptable (no one size fits all)q Translation Centre (Cloud) – HQ automated translation for all

http://www.meta-net.eu 58

Page 59: Multilingualism for Digital Europe

Multilingual Digital Single Market

Automated Translation

E-Commerce Content, Media, Verticals

Translation, Language, Knowledge, Data

Knowledge andData Repositories

Multilingual Applications

Multilingual Services

ResearchCrosslingual Big Data Language

Analytics

Meaning, Semantics, Knowledge

High-Quality Machine

Translation

SMEs CEF DSIs IT Integrators Researchprovide innovative

applications

fills gaps

H2020 RIAs

H2020 CSAs, IAs, RIAs

H2020 CSAs, RAs, national funding

Multimodal Interaction

Language Processing, Analysis and Production – Language Resources

Citizens Public Business

interoperable and standardised

collaboration with member states

Conversational Technologies

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

MLV Programme

Page 60: Multilingualism for Digital Europe

Application Areas (Selection)

q Multilingual E-commerce§ Customer-facing vs. back-office facing (after-market, after-sales)§ Crosslingual search, CRM, helpdesks, processes, workflows§ Semantic, crosslingual product descriptions and catalogues§ Online dispute resolution

q Multilingual Content, Media, Verticals§ Content analytics, curation, generation (incl. authoring support)§ Multimodal communication (conversational, written, IoT)§ Vertical domains: health, government, mobility, energy, legal.

q Translation, Language, Knowledge, Data§ Translation Cloud – written/spoken, automatic/human§ Crosslingual public and social intelligence, business intelligence§ HQ resources, under-resourced languages, domain-specific LRs

Page 61: Multilingualism for Digital Europe

Setup – Timeframe – Costs

q Close collaboration with EC, EP and all other stakeholders (including SMEs, research centres, universities, NGOs etc.).

q Mix of funding sources: § Horizon 2020 (WP 2018-2020) for EU projects (RA, RIA, CSA)§ National/regional funding sources for work on monolingual LTs

and LRs and also to support and grow SMEs in this area§ Include, strengthen and broaden role of CEF AT (public services)

q Estimated costs for basic MLV implementation: ca. 175-200M€ § Includes set of mission-critical services and applications § Timeframe: 2018, 2019, 2020

http://www.meta-net.eu 61

Page 62: Multilingualism for Digital Europe

Conclusionsand Next Steps

http://www.meta-net.eu 62

Page 63: Multilingualism for Digital Europe

q There is a lot of traction for the multilingualism/language topic.

q The EU should develop a Multilingual Strategy (incl. technology).

q Strategy must take into account several stakeholders: citizens, business/innovation, DSM, research (multiple communities).

q Most components in place: Communities, SRIAs, STOA Study etc.

q We need the political will to establish language policy change to support multilingualism (both member state level, EU level).

q Some Member States are ahead (DK, IE, EE, ES, LT, LV, NL, SL).

q Coordinate, intensify the push and keep up the pressure from Member States, EP, EC, research community, businesses etc.

q Goal: a shared programme (EU/MSs) as a concerted action.

http://www.meta-net.eu 63

Conclusions

Page 64: Multilingualism for Digital Europe

Next Steps

q Several tightly interconnected goals:§ Multilingual Technologies for Europe§ Technologies for the Multilingual Digital Single Market§ Multilingual Strategy of the European Union§ The Human Language Project

1. Discuss and further shape MLV Programme V0.9 with EC2. Extend the Cracking the Language Barrier federation 3. LT brainstorming meeting at EC, Unit G.3 (Dec. 2016)4. EP STOA Workshop on Language Technologies (Jan. 2017)5. MDSM SRIA V1.0 to be finalised (Q1 2017)

http://www.meta-net.eu 64

Page 65: Multilingualism for Digital Europe

Thank you.

[email protected]

http://www.meta-net.euhttp://www.facebook.com/META.Alliance

65

Page 66: Multilingualism for Digital Europe
Page 67: Multilingualism for Digital Europe

Language Technology Topics

q Multilingual Europe – Technologies for all European languages q Machine Translation, Text Analytics, Semantic Web etc.q Healthcare, societal challenges (ageing population, refugees etc.)q IoT, Smart Assistants and Conversational Interaction Technologiesq E-Learning – Language Technology for E-Learningq Smart Homes, Cities, Manufacturingq Smart Virtual Assistantsq Social Media Analyticsq E-Participationq Gamesq etc.

http://www.meta-net.eu 67

Page 68: Multilingualism for Digital Europe

Digital Language Extinction

q Many smaller languages are experiencing problems digitally:§ Loss of function – other languages take over entire functional areas

such as, e.g., texting, email, search, e-commerce etc.§ Loss of prestige – if it’s not on the web, the languages doesn’t exist§ Loss of competence – can you raise a digital native in your language?

q Andras Kornai’s classification – corresponds to the amount of digital communication in that language: 1. digitally thriving languages (comfort zone languages)2. vital languages3. heritage languages4. still/moribund/dead languages

q Implications for the European/global multilingual web?

http://www.meta-net.eu 68

potentially facing digital extinction …

Page 69: Multilingualism for Digital Europe

http://www.meta-net.eu

q Pan-European infrastructure, bringing together providers and consumers of language data, tools and services.

q LRs are documented, uploaded, stored, catalogued, downloaded, shared – to improve visibility, documentation, identification, availability, interoperability.

q Caters for datasets, tools, services for LT research and development (both academic and commercial); META-SHARE includes repository software, a metadata model, licensing kit, statistics.

q 29 distributed repositories maintained by 37 organisations in 25 countries.

q 2.600+ resources (corpora: 49%, lexical: 38%, tools/services: 12%),covering ca. 100 languages.

q 7.000+ downloads in total; ca. 70%of all LRs have been downloaded.

Page 70: Multilingualism for Digital Europe
Page 71: Multilingualism for Digital Europe

Preparation of the SRA

q Strategic Research Agendas of other initiatives were screened.q Many suggestions as input from Vision Group members.q We discussed procedures, input and structure of the SRA in four

meetings of the META Technology Council.§ Brussels, Belgium, November 16, 2010§ Venice, Italy, May 25, 2011§ Berlin, Germany, September 30, 2011§ Brussels, Belgium, June 19, 2012

q Additional input in talks, meetings, workshops, discussions, etc.§ Example: Three HLT Expert Meetings organised by the EC (end of 2011)

q Almost 200 experts contributed to the SRA (54% from industry; 46% from research; 4% from national/international institutions).

http://www.meta-net.eu 71

Page 72: Multilingualism for Digital Europe

• Published in early 2013.

• First strategic research agenda for our field.

• Complex process of collecting and shaping technology visions.

• Hundreds of researchers participated.

• Broad topics around multi-lingual Europe in general.

Page 73: Multilingualism for Digital Europe

PT1: Translingual Cloud

q Europe has a big need for translations of publishable quality. q Focus on high-quality translation.q New research paradigms

§ Inclusion of professional translators into the research process

§ Inclusion of technologists into research on human translation processes

q Different technological approaches§ Stronger emphasis on the properties of

individual languages § A central role for semantics

q Methods for specific genres & domains

http://www.meta-net.eu 73

Page 74: Multilingualism for Digital Europe

Priority Research Theme 1: Translingual Cloud

Anydevice

Target groups: European citizen, language professional, organisations, companies, European

institutions, software applications

Multiple target formats

Single accesspoint

Automatic translation and interpretation

Language checking Post-editing Workbenches for creative

translations Novel translation and authoring

workflows

Quality assurance Computer-supported human

translation Multilingual content production and

text authoring Trusted service centre (privacy,

confidentiality, security of source data)

Services and Technologies:

Crosslingual communication, translation and search

Real-time subtitling, voice-over generation and translating speech from live events

Mobile interactive interpretation

Multilingual content production (media, web, technical, legal documents)

Showcases: translingual spaces for ambient translation

Applications:

Written (twitter, blog, article, newspaper,text with/without metadata etc.) orspoken input (spontaneous spoken

language, video/audio, multiple speakers)

Modular combination of analysis, transfer

and generation models

From very fast but lower quality to slower but very high quality (including

instant quality upgrades)

Exploiting strong monolingual analysis

and generation methods and resources

Multiple target formats

Domain, task and genre specialisation

models

Extending translation with

semantic data and linked open data

Page 75: Multilingualism for Digital Europe

PT2: Social Intelligence

q Better decisions by monitoring social mediaq Inclusion of citizens into collective decision processesq Opinion formation, consensus building, decision makingq Evolution of new solutionsq New forms of democracy: e-democracy,

massive participation, transparencyq Dialogues and debates across language

boundaries and across parties, political alliances, social classes

q Better than binary votingq Documented transparent

decision processes

http://www.meta-net.eu 75

Page 76: Multilingualism for Digital Europe

Priority Research Theme 2: Social Intelligence and e-Participation

From shallow to deep, from coarse-grained to

detailed processing techniques

Making language technologies interoperable

with knowledge representa-tion and the semantic web

“Semantification” of the web: tight integration with the Semantic Web and Linked Open Data

Mapping large, heterogeneous, unstructured volumes of online content to structured, actionable

representations

Unleashing social intelligence by detecting and monitoring opinions,

demands, needs and problems

Target groups: European citizen, European institutions, discussion

participants, companies

Make use of the wisdom of the

crowds

Improved efficiency and

quality of decision processes

Understanding influence diffusion across social media

especially social media, comments, blogs, forums

decision-relevant information

support

sentiment analysis and opinion mining including the temporal dimension)

cues

from arbitrary online content

visualising discussions and opinion statements

Services and Technologies:

collective deliberation and e-participation

-wide deliberation on pressing issues

and processes; modeling evolution of opinions

analysis technologies

Applications:

Page 77: Multilingualism for Digital Europe

Priority Research Theme 3: Socially-Aware Interactive Assistants

Interacting naturally

with and in groups

Learning and

forgetting information

Adaptable to the user’s needs and preferences and the environment

Include human-computer, human-artificial agent and

computer-mediated human-human communication

Proactive, self-aware,

user-adaptable

Interacts naturally with humans, in any

language and modality

Can be personalised to individual communication

abilities including special needs

Can learn incrementally from all interactions and

other sources of information

recognition

and synthesis, providing expressive voices

understanding

incremental conversational speech

models of human communication

inter-dependencies

priority themes

Services and Technologies:

Applications:

dialogue systems

environment

modalities (visual, tactile, haptic) verbal/non-verbal behaviour, social context

ments, any

vocabulary

recovery,self-

assessment

Multilingualcapabilities

Page 78: Multilingualism for Digital Europe
Page 79: Multilingualism for Digital Europe

ii Strategic Agenda for the Multilingual Digital Single Market – Version 0.5 – April, 2015

ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i1 The Digital Single Market is a Multilingual Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Overcoming Language Barriers with Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Language Technologies Made for Europe – in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Online Use of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Multilingual Big Data Text Analytics for the European Data Economy . . . . . . . . . . . . . . . . . . . . . 61.5 EC and Language Technology – Past and Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 The Economic Power of Language Technology and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 A Strategic Programme for the Multilingual Digital Single Market . . . . . . . . . . . . . . . . . . . . . . . 102.1 Layer 1: Innovative Technology Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 102.3 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Related Areas, Applications, and Societal Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Layer 1: Innovative Technology Solutions for the Multilingual Digital Single Market . . . . . . . 183.1 Technology Solutions for Businesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Unified Customer Experience and Cross-Cultural CRM (E-Commerce) . . . . . . . . . . . . . . 183.1.2 Digital Translation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Content Curation and Content Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.4 Virtual and Real Translingual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.5 Voice of the Customer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.6 Business Intelligence using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.7 Multimodal User Experience for Connected Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.8 Smart Multilingual Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Technology Solutions for Public Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Voice of the Citizen – Social Intelligence on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Online Dispute Resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.3 E-Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 E-Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.5 E-Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6 E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 295 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Horizontal Framework Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1 Language Policies and Public Procurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Standards and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Open Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.4 Copyright and Data Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.1 Expected Economic Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.2 Relevance to the EC’s Digital Single Market Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.3 Potential Funding Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Appendix A. Input Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Appendix B. Digital Language Extinction in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Page 80: Multilingualism for Digital Europe

q European Parliament§ Upcoming STOA Study and Workshop (Jan. 2017)

q European Commission § DG CONNECT: Horizon 2020 WP 2018-2020 (G1)§ DG CONNECT: New Unit “Learning, Multilingualism, Inclusion” (G3) § DG Translation: Connecting Europe Facility, AT

q Language Communities: EFNIL and NPLD§ Joint position paper META-FORUM 2015, 2016

q EU Member States and Non-Member States§ National and regional funding agencies (ES, NL etc.)

q Research Communities, especially Big Data community (BDVA SRIA V3.0), Web community and many others (Robotics, IoT etc.)

q Standardisation – W3C and others

http://www.meta-net.eu 80

Multilingual Europe Stakeholders

Page 81: Multilingualism for Digital Europe

Multilingual Success Stories

q Moses SMT toolkit as well as research and technology ecosystem

q CEF AT for public online services – good and timely development

q eBay: MT to Russian – 50% increase in sales

q Hugo.lv for Latvian public services – better than Google Translate

q Hundreds of European startups in Language Technology and AI

q Conversational interfaces (Siri, Echo, Cortana): the next big thing

q IBM Watson – a billion dollar LT business

q Great Neural MT results reported by European researchers (QT21)

q Very rapid development – many opportunities for European R&D&I

http://www.meta-net.eu 81