70
Co-funded by the 7th Framework Programme and the ICT Policy Support Programme of the European Commission through the contracts T4ME, CESAR, METANET4U, META-NORD (grant agreements no. 249119, 271022, 270893, 270899). The State of Computational Morphology for Europe’s Languages and the META-NET Strategic Research Agenda Georg Rehm Network Manager META-NET DFKI, Berlin, Germany [email protected] 3rd Int. Workshop on Systems and Frameworks for Computational Morphology (SFCM 2013) Berlin, Germany – September 06, 2013

Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Embed Size (px)

DESCRIPTION

Georg Rehm. Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020. 3rd. International Workshop on Systems and Frameworks for Computational Morphology (SFCM 2013), Berlin, Germany, September 2013. September 06, 2013. Invited keynote talk.

Citation preview

Page 1: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Co-funded by the 7th Framework Programme and the ICT Policy Support Programme of the European Commission through the contracts T4ME, CESAR, METANET4U, META-NORD (grant agreements no. 249119, 271022, 270893, 270899).

The State of Computational Morphology for Europe’s Languages and the META-NET Strategic Research Agenda

Georg Rehm

Network Manager META-NET DFKI, Berlin, Germany

[email protected]

3rd Int. Workshop on Systems and Frameworks for Computational Morphology (SFCM 2013)

Berlin, Germany – September 06, 2013

Page 2: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Outline

q  Introduction

q  Language White Paper Series: Europe’s Languages in the Digital Age

q  The State of Computational Morphology for Europe’s Languages

q  The META-NET Strategic Research Agenda for Multilingual Europe

q  Conclusions

http://www.meta-net.eu 2

Page 3: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Multilingual Europe

3 http://www.meta-net.eu

q  Where were we back in 2010?

q  Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage.

q  While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years.

q  All stakeholders – researchers, LT user and provider industries, language communities, funding programmes, policy makers – should team up in a strategic alliance for a major dedicated push.

Page 4: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Objectives

META-NET is a network of excellence dedicated to fostering the tech-nological foundations of the European multilingual information society.

http://www.meta-net.eu 4

Page 5: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Four EU-Funded Projects

q  Initial project: T4ME (FP7; 13 partners, 10 countries)

q  Three ICT-PSP consortia since Feb. 2011: CESAR, METANET4U, META-NORD

q  All four projects ended on January 31, 2013.

q  All EU member states and several non-member states covered.

q  META-NET in Sept. 2013: 60 members in 34 countries.

http://www.meta-net.eu 5

http://www.meta-net.eu/members

Page 6: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Language White Paper Series Europe’s Languages in the Digital Age

http://www.meta-net.eu 6

Page 7: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Language White Paper Series

http://www.meta-net.eu 7

q  “Europe’s Languages in the Digital Age”. q  Reports on the state of our languages in

the digital age and the level of support through language technology.

q  Series covers 30 languages. q  Key communication instruments to

address decision makers and journalists. q  Inform about societal and technological

problems and challenges as well as economic opportunities.

q  >2 years in the making. q  >200 national experts as contributors. q  >8.000 copies printed and distributed to

politicians and journalists.

Page 8: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Language White Paper Series

http://www.meta-net.eu 8

q  Structure: §  Part 1: Executive Summary §  Part 2: Languages at Risk — A Challenge for Language Technology §  Part 3: The [X] Language in the European Information Society §  Part 4: LT support for [X] §  Part 5: About META-NET; References, etc.

q  Language White Paper Series (published at Springer): §  Ca. 8.000 printed copies distributed by META-NET. §  Printed copies can be purchased through the usual channels. §  Ebooks available via SpringerLink (fee) and META-NET website (free). §  http://www.meta-net.eu/whitepapers

Page 9: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

30 Languages Covered

q  Basque q  Bulgarian* q  Catalan q  Czech* q  Danish* q  Dutch* q  English* q  Estonian* q  Finnish* q  French*

q  Galician q  German* q  Greek* q  Hungarian* q  Icelandic q  Irish* q  Italian* q  Latvian* q  Lithuanian* q  Maltese*

q  Norwegian q  Polish* q  Portuguese* q  Romanian* q  Serbian q  Slovak* q  Slovene* q  Spanish* q  Swedish* q  Croatian

http://www.meta-net.eu 9

* = Official EU language Next up: Welsh

Page 10: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Cross-Lingual Comparison

q  In four application areas, each language is assigned to one of five clusters, ranging from excellent LT support to weak/no support:

1.  Machine Translation 2.  Speech Processing

3.  Text Analytics

4.  Language Resources q  Results finalised at a

meeting in Berlin with representatives of all 30 languages (October 21/22, 2011).

http://www.meta-net.eu 10

Page 11: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

MT

http://www.meta-net.eu 11

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian

weak or no support

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician,

Greek, Icelandic, Irish, Latvian, Lithu-anian, Maltese, Norwegian, Portuguese,

Serbian, Slovak, Slovene, Swedish

excellent

Czech, Dutch, Finnish, French, German,

Italian, Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek,

Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak,

Slovene, Swedish

weak or no support

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian

excellent

English

good

Czech, Dutch, French, German, Hungarian,

Italian, Polish, Spanish, Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician,

Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene

Icelandic, Irish, Latvian, Lithuanian, Maltese

weak/no support excellent

Res

ourc

es

Text

Ana

lysi

s

Page 12: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Europe’s Languages and LT

http://www.meta-net.eu 12

Dutch French German Italian

Spanish

Catalan Czech

Finnish Hungarian

Polish Portuguese

Swedish

Basque Bulgarian

Danish Galician

Greek Norwegian Romanian

Slovak Slovene

Croatian Estonian Icelandic

Irish Latvian

Lithuanian Maltese Serbian

English

good support through Language Technology

weak or no support

Page 13: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Not enough R&I on European languages

  LT research on European languages, except for English, is too weak and too slow

  Many languages are badly covered

0

50

100

150

200

250

300

350

400

450

English

Ch

inese

Germ

an, Stand

ard

Fren

ch

Spanish

Japane

se

Arabic

Dutch

Portugue

se

Czech

Danish

Swed

ish

Hind

i Ko

rean

Turkish

Ita

lian

Russian

Finn

ish

Hebrew

Hu

ngarian

Sloven

e Urdu

Romanian

Zulu

Bulgarian

Catalan-­‐Va

lencian-­‐Ba

lear

Greek

Thai

Welsh

Estonian

Basque

Ge

rman, Swiss

InukStut

Indo

nesia

n Ineseñ

o LaSn

Marathi

Malay

Pushto

Serbian

Syria

c Tamil

UgariS

c Ukrainian

Uspanteko

Vietnamese

Languages treated in the 2010 editions of Journal of Computational Linguistics and Conferences of ACL, EMNLP and COLING. Many European languages with no reference at all: Slovak, Maltese, Lithuanian, Irish, Albanian, Croatian, Galician etc.

Page 14: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Key Observations

http://www.meta-net.eu 14

q  When it comes to Language Technology support, there are massive differences between Europe’s languages and technology areas.

q  LT support for English is ahead of any other language.

q  Even support for English is far from being perfect.

q  The gap between English and the other languages keeps widening!

q  Several languages – Icelandic, Latvian, Lithuanian, Maltese – receive this weakest score in all four areas!

q  At least 21 European languages in danger of digital extinction!(Languages put into the “weak or no support” category at least once.)

Page 15: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

White Paper Box Sets (100 copies)

http://www.meta-net.eu 15

Page 16: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

White Paper Website

http://www.meta-net.eu 16

Page 17: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign

q  Headline of press release: At Least 21 European Languages in Danger of Digital Extinction.

q  Sent out to journalists, politicians and other stakeholder groups on the European Day of Languages (Sept. 26, 2012).

q  Overwhelmed by the huge interest in the topic and our key findings!

q  600+ mentions in the press. q  50+ broadcast interviews with META-NET representatives (ca. 30 radio

interviews, ca. 25 television reports).

q  News came in from 40+ countries in 35+ different languages. q  Whole of Europe covered.

q  Two Parliamentary Questions in the European Parliament on the “digital extinction of languages” topic.

http://www.meta-net.eu 17

Page 18: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Coverage by Country

http://www.meta-net.eu 18

Spain, 15.90%

Bulgaria, 10.80%

International, 7.90%

Latvia, 5.30%

Netherlands, 4.80%

Greece, 4.60% Romania, 4.40%

Serbia, 4.40%

Italy, 4.20%

Germany, 3.50%

Russia, 3.50%

Estonia, 2.90%

France, 2.60%

Slovenia, 2.40%

Iceland, 2.20% Malta, 2% USA, 1.50%

Denmark, 1.30%

Latin America, 1.30%

Lithuania, 1.30%

Ireland, 1.30% UK, 1.10%

Belgium, 0.90%

Finland, 0.70% Sweden, 0.70%

Poland, 0.70%

Norway, 0.40%

Mexico, 0.40%

Brazil, 0.40%

Slovakia, 0.40%

Basque Country, 0.40% Portugal, 0.40% Austria, 0.20%

New Zealand, 0.20%

Hungary, 0.20%

Bosnia and Herzegovina, 0.20%

Costa Rica, 0.20%

Cyprus, 0.20%

Canada, 0.20%

Australia, 0.20%

Spain Bulgaria International Latvia Netherlands Greece Romania Serbia Italy Germany Russia Estonia France Slovenia Iceland Malta USA Denmark Latin America Lithuania Ireland UK Belgium Finland Sweden Poland Norway Mexico Brazil Slovakia Basque Country Portugal Austria New Zealand Hungary Bosnia and Herzegovina Costa Rica Cyprus Canada Australia

Page 19: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Response: Examples

q  Austria: Der Standard. q  Denmark: Politiken, Berlingske Tidende. q  Finland: Tiede. q  Germany: Heise Newsticker, Süddeutsche Zeitung. q  Greece: in.gr, Πρώτο Θέµα, Prosilipsis. q  Hungary: Origo. q  Iceland: Fréttablaðið, Morgunblaðið. q  Italy: Wired. q  Norway: Computerworld. q  Slovenia: Delo, Dnevnik, Demokracija. q  Serbia: Politika. q  Spain: El Mundo. q  UK: Huffington Post. q  USA: Mashable, NBC News, Reddit.

http://www.meta-net.eu 19

Page 20: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 20

Af Flemming Steen Pedersen// [email protected]

Langt flere kræftpatienter i hovedstadsområ-det skal behandles hurtigt og uden forsinkel-ser.

Det skal være slut med, at undersøgelse og behandling trækker i langdrag og overskrider de tidsfrister, som fagfolk har fastsat for at give patienterne de optimale chancer for at over-leve den frygtede sygdom.

Det er målet, når politikere i Region Hoved-staden nu lægger op til at udmønte en pulje på 32 mio. kr. til at øge personalet og udvide behandlingskapaciteten på kræftområdet på en række af regionens hospitaler.

Pengene kommer, efter at regionen er blevet kritiseret for, at alt for mange kræft-patienter er for lang tid om at komme igen-nem systemet. F.eks. er det ifølge den seneste opgørelse kun godt halvdelen af kvinder med brystkræft, som bliver behandlet inden for det fastsatte mål på 18 dage i de såkaldte kræft-pakker.

»Pengene betyder, at der kommer bedre forhold for kræftpatienter. Det er vigtigt, at folk får mulighed for at blive behandlet hur-tigt, så de ikke skal gå rundt og være bekym-rede,« siger formand for kvalitetsudvalget i Region Hovedstaden, Kirsten Lee (R).

Flere får kræft – og flere overleverKonkret er hensigten at udvide den onkologi-ske kapacitet – det vil sige stråle- og kemobe-handlingen – på såvel Rigshospitalet, Herlev Hospital, Hillerød Hospital og Bornholms Hospital.

Desuden sættes der penge af til at øge antal-let af operationer og udvide ambulatorieka-paciteten på det urologiske område på Herlev,

Bispebjerg og Frederiksberg. Foruden pro-blemer med lange ventetider for brystkræft-patienter er der således også patienter med prostatakræft, som venter for længe. På dags-ordenen er også at sikre hurtigere behandling til en tredje gruppe af patienter med hoved-halskræft, hvor et stort antal patienter ligele-des må vente længere end tidsgrænsen på 16 dage.

Udover at tilføre flere penge overvejes det også at indføre såkaldte servicemål for, hvor stor en andel af patienterne der skal i behandling inden for de fastsatte tidsgrænser i kræftpakkerne. Lignende servicemål findes i forvejen i Region Midtjylland og Region Syddanmark og betragtes som et middel til at presse hospitalerne og signalere, at bestemte områder har særlig høj politisk bevågenhed.

I de to regioner er målet, at henholdvis 90 og 95 pct. af patienterne skal igennem syste-met inden for forløbstiderne, og Kirsten Lee forventer, at et eventuelt servicemål i Region Hovedstaden kommer til at ligge på et tilsva-rende niveau.

I Kræftens Bekæmpelse hilser direktør Leif Vestergaard Pedersen det velkomment, at Region Hovedstaden nu bruger 32 mio. kr. til at udvide kapaciteten .

»Det har vist sig, at der er et forbedringspo-tentiale på dette område, og derfor er det godt, at man prioriterer det. Flere og flere får kræft, og flere og flere overlever. Det betyder, at kapa-citeten gradvist skal øges hele tiden. Service-mål er et godt initiativ, og et mål på 90-95 pct. er nok det realistiske, selv om udgangspunk-tet bør være 100 procent,« siger Leif Vesterga-ard Pedersen og tilføjer:

»Men så er det også vigtigt at holde fast i det mål og ikke stille sig tilfreds med, at 80 eller 85 pct. kommer igennem til tiden.« B

Kræft syge skal have hurtigerebehandling

Oprustning. Region Hovedstaden bruger 32 mio. kr. på at øge behandlingskapaciteten.

Af Jens Ejsing// [email protected]

Det danske sprog har det svært i den digitale verden.

Det konstaterer danske sprogforskere- og eksperter i forbindelse med den nye inter-nationale undersøgelse META-NET, der ser nærmere på, hvordan en lang række mindre, europæiske sprog som dansk klarer sig i den digitale verden.

Forskerne fra bl.a. Københavns Universitet og Dansk Sprognævn når frem til, at dansk i fremtiden kan få det endnu sværere i den digitale verden, fordi Google Translate, GPSer, applikationer til smartphones og andre sprog-teknologiske programmer ikke i tilstrækkelig grad formår at behandle de mange nuancer i det danske sprog.

Professor i sprogteknologi på Københavns Universitet, Bolette Sandford Pedersen, mener, at der er brug for en slags digital dansk sprogbank fyldt med data, så bl.a. oversættel-ser bliver så præcise og gode som muligt. Med

hjælp fra sprogbanken kan forskere ifølge professoren hjælpe virksomheder med at for-bedre programmer, der skal håndtere sproglig viden om bl.a. maskinoversættelse, tale-genkendelse og informationssøgning.

Dermed vil der blive længere mellem fejlag-tige oversættelser, som når »hæld olie på pan-den« med Google Translate bliver til »pour oil on the forehead« på engelsk. Oversættelser, der er i værste fald er så upræcise, at danskere ender med at fravælge deres eget sprog i den digitale verden.

Sproghjælp til virksomhederHun anerkender dog, at »teknologien til auto-matiske oversættelser på mange måder er fantastisk«.

»Den er bare ikke god nok, når det gælder dansk,« siger hun:

»Det er som om, at vi i et vist omfang lægger det i hænderne på Google eller andre virk-somheder at afgøre, om dansk skal behandles godt nok eller ej. Men det danske marked er ikke stort for dem. Spørgsmålet er derfor,

Dårlig sprogteknologi truer dansk på nettetOrd. Forskere arbejder på at forbedre danske oversættelser på internettet.

om vi ikke i højere grad selv skal gøre noget for at sikre, at det fornødne datamateriale er til rådighed, så vi får gode oversættelser og anden god sprogteknologi. Det kunne f.eks. være ved, at vi gjorde en indsats for at få opret-tet en sprogbank med en masse beriget mate-riale om dansk.«

»Hvis vi hele tiden oplever, at oversættel-ser er behæftede med fejl, tør vi ikke stole på dem,« siger hun og understreger, at »fejlagtige oversættelser kan føre til store misforståelser«.

Ifølge Dansk Sprognævns direktør, Sabine Kirchmeier-Andersen, kan dårlig sprogtekno-logi have konsekvenser for mange danskere, der ikke er så gode til engelsk.

»Hvis vi har ambitioner om at bruge det danske sprog i fremtidens teknologiske univers, skal der gøres en indsats nu for at fastholde ekspertise og udbygge den viden, vi har,« mener hun:

»Ellers risikerer vi, at kun folk, der taler fly-dende engelsk, vil få glæde af de nye generatio-ner af web-, tele- og robotteknologi, der er på vej.« B

INFOGRAFIK: HENRIK KIÆR / TEKST: FLEMMING STEEN PEDERSEN KILDE: REGION HOVEDSTADEN

De såkaldte kræftpakker, der blev indført i 2008 og 2009 for at sikre de danske kræftpatienter langt hurtigere undersøgelser og behandling, beskriver et standardudrednings- og -behand-lingsforløb. Det vil sige, hvilke undersøgelser og behandlinger der skal udføres, og hvor lang tid der højst må gå med de enkelte aktiviteter. Opgørelser fra Region Hovedstaden viser imidlertid, at en stor del af patienterne ikke behandles inden for de fastsatte tidsgrænser, og at der især er problemer inden for tre kræftsygdomme: brystkræft, hoved- og halskræft og prostatakræft.

Kræftbehandling trækker ud

PROSTATAKRÆFTServicemål: 35-39 dage

24

76

HOVED- OG HALSKRÆFTServicemål: 16 dage

40

60

BRYSTKRÆFTServicemål: 18 dage

4753

Procentdel inden for servicemål

Procentdel uden for servicemål

Sådan læses grafikken:

Positiv udviklingNegativ udvikling

H Der er omkring 80 sprog i EU. For 21 af dem – også dansk – gælder det, at der er store sprogteknologiske mangler, når det gælder bl.a. maskinoversættelse, talegenken-delse og informationssøgning.

H Ifølge en EU-undersøgelse køber et stigende antal europæiske internetbrugere varer eller tjenester på nettet, hvor det sprog, der bliver anvendt, ikke er deres eget. Det gælder over halvdelen af brugerne.

H Over hver tredje anvender et fremmed-sprog til at skrive mail eller indlæg på nettet.

fakta HSprog i Europa

REDIGERET AF JOANNA VALLENTIN. LAYOUT: JACOB FRIIS/ NATIONALT /06. BERLINGSKE / 1.SEKTION / LØRDAG 22.09.2012

Page 21: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 21

38

Στην ψηφιακή εποχή δεν… µιλούν ελληνικά, όπως και αρκετές άλλες ευρωπαϊκές

γλώσσες, σύµφωνα µε πανευρωπαϊ-κή έκθεση µε την υπογραφή 200 και πλέον ειδικών. Η συγκεκριµένη µελέ-τη δηµοσιεύτηκε από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ µε αφορµή τη χτε-σινή Ευρωπαϊκή Ηµέρα Γλωσσών.

Για τις ανάγκες της έρευνάς τους, γλωσσολόγοι από 34 χώρες της Γη-ραιάς Ηπείρου βαθµολόγησαν τις διαθέσιµες γλωσσικές υπηρεσίες και δηµιούργησαν ένα «Λευκό Βι-βλίο» για κάθε ευρωπαϊκή γλώσσα. Στη µελέτη τους, οι ειδικοί αναζήτη-σαν µεταξύ άλλων τέσσερα βασικά ηλεκτρονικά εργαλεία, δηλαδή την ύπαρξη αυτόµατης µετάφρασης, τη δυνατότητα φωνητικής αλληλε-πίδρασης και ψηφιακής ανάλυσης κειµένου, ενώ ταυτόχρονα διερευνή-θηκε και η διαθεσιµότητα γλωσσικών πόρων ή πηγών.

Σε πρώτη φάση εξέτασαν τις ιστο-σελίδες που επιτρέπουν στους χρή-στες να κάνουν µεταφράσεις online, όπως, για παράδειγµα, η υπηρεσία του κολοσσού πληροφορικής Google Translate. Την ίδια ώρα, εξετάστηκε και η «επικοινωνία» των ελληνόφω-νων χρηστών µε τις…συσκευές τους, όπως για παράδειγµα η δυνατότητα

να «µιλήσει» κάποιος στο GPS στη µητρική του γλώσσα. Οι ερευνητές κατέληξαν στο συµπέρασµα ότι υπάρχουν τέτοιες συσκευές, αλλά δεν είναι τόσο διαδεδοµένες όσο οι αγγλόφωνες. Το «χρυσό» µετάλλιο κατακτά,

όπως είναι άλλωστε και λογικό, η αγγλική γλώσσα. Οι αγγλόφωνοι χρή-στες έχουν την καλύτερη δυνατή τε-χνολογική υποστήριξη, κάτι το οποίο ευνοεί την περαιτέρω εξάπλωση της γλώσσας. Από «τεχνολογικό απο-κλεισµό» κινδυνεύουν περισσότερο η ισλανδική, η λετονική, η λιθουανική και η µαλτέζικη γλώσσα, ενώ σε λίγο καλύτερη µοίρα βρίσκονται η ελλη-νική, η βουλγαρική, η ουγγρική και η πολωνική, που όπως αναφέρει η έρευνα έχουν «αποσπασµατική» τε-χνολογική υποστήριξη.

«Μέτρια» χαρακτηρίζεται η υπο-στήριξη χρηστών σε ολλανδική, γαλ-λική, γερµανική, ιταλική και ισπανική γλώσσα. Οι επικεφαλής της επιστη-µονικής οµάδας, Χανς Ουζκοράιτ και Γκεόργκ Ρεµ, αναφέρουν χαρακτηρι-στικά: «Υπάρχουν δραµατικές διαφο-ρές στην υποστήριξη της γλωσσικής

τεχνολογίας ανάµεσα στις διάφορες ευρωπαϊκές γλώσσες. Το χάσµα µετα-ξύ “µικρών” και “µεγάλων” γλωσσών ολοένα και διευρύνεται. Πρέπει να εξασφαλίσουµε τον εφοδιασµό των µικρότερων και λιγότερο πλούσιων σε ψηφιακούς πόρους γλωσσών µε τις απαραίτητες βασικές τεχνολογί-ες. ∆ιαφορετικά, οι γλώσσες αυτές είναι καταδικασµένες σε ψηφιακή εξαφάνιση».

Μάλιστα, οι ειδικοί τονίζουν ότι χω-ρίς αποφασιστική δράση οι γλώσσες αυτές δύσκολα θα… επιβιώσουν στον ψηφιακό κόσµου του 21ου αιώνα. Η κ. Μαρία Γαβριηλίδου, µέλος της επι-στηµονικής οµάδας από το Ινστιτούτο

Επεξεργασίας του Λόγου Ερευνητικό Κέντρο Αθηνά, λέει στον «Ε.Τ.»: «Η έρευνα αυτή δεν λέει ότι δεν θα ζήσει η ελληνική γλώσσα ή ότι κινδυνεύει µε εξαφάνιση». Η ειδικός εξηγεί ότι όσο υπάρχουν άνθρωποι που µιλά-νε, γράφουν και επικοινωνούν µε µια γλώσσα, τότε αυτή θα συνεχίσει να υπάρχει. Είναι σηµαντικό, όµως, να έχουν όλοι οι χρήστες τη δυνατότητα να «µιλήσουν» στις µηχανές, όπως τα GPS τους, στα ελληνικά και να έχουν στη διάθεσή τους γλωσσικά εργαλεία ηλεκτρονικών υπολογιστών.

Μεταξύ αυτών των «εργαλείων» είναι οι διορθωτές ορθογραφικών και συντακτικών λαθών, που χρησιµοποι-ούνται καθηµερινά από εκατοντάδες Ελληνες χρήστες και βασίζονται στη γλωσσική τεχνολογία. Παρ’ όλα αυτά, τονίζει ότι η ψη-

φιακή εξάπλωση µιας γλώσσας είναι σηµαντική «∆εν είναι στα χέρια του µέσου χρήστη. Οι εκάστοτε κυβερ-νήσεις, η Ευρωπαϊκή Ενωση και ο ιδιωτικός τοµέας πρέπει να χρηµα-τοδοτήσουν την ανάπτυξη αυτής της τεχνολογίας για όλες τις γλώσσες», αναφέρει και συνεχίζει: «Οι χρήστες, όµως, πρέπει να απαιτούν να υπάρ-χουν και στη γλώσσα τους τα µέσα αυτά και να µην ικανοποιούνται µε τα αγγλικά».

Πέµπτη 27 Σεπτεµβρίου 2012 ΕΛΕΥΘΕΡΟΣ ΤΥΠΟΣ

LifeΠΟΛΛΕΣ ΕΥΡΩΠΑΪΚΕΣ ΓΛΩΣΣΕΣ ΘΕΩΡΟΥΝΤΑΙ ΤΕΧΝΟΛΟΓΙΚΑ… ΞΕΠΕΡΑΣΜΕΝΕΣ

Με ψηφιακή εξαφάνιση κινδυνεύουν τα ελληνικά

ΕΛΕΝΗ ΒΕΡΓΟΥ[email protected]

Η γλώσσα της αποξένωσης…

XX GREEKLISH

Οι αγγλόφωνοι χρήστες έχουν την καλύτερη δυνατή τεχνολογική υποστήριξη, γεγονός που ευνοεί την περαιτέρω εξάπλωση της γλώσσας

ΜΕ GREEKLISH επικοινω-νούν πλέον µέσω µηνυµά-των ή email οι περισσότεροι νέοι της χώρας µας. Παρά το γεγονός ότι τα τελευ-ταία χρόνια υπάρχουν τα γλωσσικά εργαλεία, τα οποία επιτρέπουν τη χρήση της ελληνικής γραµµατο-σειράς, έφηβοι και νέοι ενήλικες φαίνεται ότι δεν έχουν «αγκαλιάσει» αυτές τις τεχνολογίες. Ο καθη-γητής Γλωσσολογίας, κ. Γιώργος Μπαµπινιώτης, λέει στον «Ε.Τ.»: «Τα greeklish είναι πρόβληµα για την ελληνική γλώσσα, ιδίως για ανθρώπους νέας ηλικίας για έναν καθαρά γλωσσικό λόγο. Με τη χρήση των greeklish αποξενώνονται από τη µορφή της λέξης ή όπως λέµε το ετυµολογικό ίνδαλµα που δηλώνεται µε την ορθογραφία της λέξης και συνδέεται και µε τη ση-µασία της λέξης και µε την προέλευσή της». Ο κίνδυνος, µε τον οποίο έρχονται αντι-µέτωποι οι νέοι άνθρωποι, είναι η αποξένωση από τη γραπτή µορφή της γλώσ-σας. Αυτή η «οικειότητα», όµως, βοηθάει και στην κατανόηση της σηµασίας αλλά και την προέλευση της λέξης. «Αυτή η αποξένωση δεν είναι άνευ σηµασίας», αναφέρει ο ειδικός, ο οποίος εξηγεί ότι η διαδικασία της γραφής βοηθάει να εντυπω-θεί η λέξη και να συνδεθεί µε άλλες οµόρριζες λέξεις. «Οταν χρησιµοποιείται αυτή η µορφή επικοινωνίας, κα-ταστρέφονται, ατονούν. ∆εν είναι προς θάνατο, αλλά θα κάνει ζηµιά», αναφέρει ο κ. Μπαµπινιώτης, ο οποίος συµβουλεύει τους χρήστες να επιλέγουν την ελληνική γραµµατοσειρά.

Γιώργος Μπαµπινιώτης.

Date 30 September 2012 Page 16

Copyright material. This may only be copied under the terms of a Newspaper Licensing Agency agreement (www.nla.co.uk) or with written publisher permission. For external republishing rights see www.nla-republishing.com

Page 22: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 22

49KYPIAKH 30 ΣΕΠΤΕΜΒΡΙΟΥ 2012

Η 26η Σεπτεµβρίου έχει καθιε-ρωθεί από το Συµβούλιο τηςΕυρώπης ως η ΕυρωπαϊκήΗµέρα των Γλωσσών, αλλά,

σύµφωνα µε µια νέα ευρωπαϊκή επι-στηµονική έκθεση, οι 21 από τις 30γλώσσες της Ευρώπης -µεταξύ των οποί-ων και η Ελληνική- αντιµετωπίζουν κίν-δυνο ψηφιακής εξαφάνισης. Η έρευνα κρούει τον κώδωνα κινδύ-

νου, καθώς διαπίστωσε ότι η ψηφιακήβοήθεια για τις περισσότερες ευρωπαϊκέςγλώσσες είναι ελλιπής ή απολύτως ανύ-παρκτη για τους χρήστες.

Τις έφαγαν οι κοινέςΗ έκθεση, µε τη µορφή µιας σειράς

Λευκών Βίβλων (µε τίτλο «Γλώσσες στηνΕυρωπαϊκή Κοινωνία της Πληροφορίας»),από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ, το οποίο συνενώνει 60 ερευνητικάκέντρα σε 34 χώρες, επισηµαίνει ότι οιγλώσσες που µιλιούνται από σχετικάµικρό αριθµό ανθρώπων κινδυνεύουν,επειδή δεν έχουν τεχνολογική υποστή-ριξη όπως έχουν οι ευρέως χρησιµο-ποιούµενες γλώσσες. Λευκές Βίβλοιέχουν καταρτιστεί για τις εξής ευρω-παϊκές γλώσσες: αγγλικά, βασκικά,βουλγαρικά, γαλικιανά, γαλλικά, γερ-µανικά, δανικά, ελληνικά, εσθονικά,ιρλανδικά, ισλανδικά, ισπανικά, ιταλικά,καταλανικά, κροατικά, λετονικά, λι-θουανικά, µαλτέζικα, νορβηγικά (µπουκ-µόλ και νινόρσκ), ολλανδικά, ουγγρικά,πολωνικά, πορτογαλικά, ρουµανικά,σερβικά, σλοβακικά, σλοβενικά, σουη-δικά, τσεχικά και φινλανδικά. ΚάθεΛευκή Βίβλος είναι γραµµένη στη γλώσ-σα στην οποία αναφέρεται και είναιµεταφρασµένη στα αγγλικά.

Τέσσερις µεγάλοι κίνδυνοιΣύµφωνα µε τη νέα µελέτη, η Ισ-

λανδική, η Λετονική, η Λιθουανική καιη Μαλτέζικη αντιµετωπίζουν τον µε-γαλύτερο κίνδυνο εξαφάνισης σε µιαευρωπαϊκή τεχνολογική κοινωνία, πουολοένα περισσότερο προωθεί τη χρήσησυγκεκριµένων γλωσσών και ιδίως τηςΑγγλικής. Όµως και άλλες γλώσσες,όπως η Ελληνική, η Βουλγαρική, η Ουγ-γρική και η Πολωνική, επίσης κινδυ-νεύουν στον σύγχρονο ψηφιακό κόσµο. Η έρευνα του ΜΕΤΑ-ΝΕΤ, στην οποία

συνέβαλαν περισσότεροι από 200 ειδικοί,αξιολογεί τον κίνδυνο για κάθε γλώσσαµε βάση τέσσερα βασικά κριτήρια σετεχνολογικό/ψηφιακό επίπεδο: την ύπαρ-ξη αυτόµατης µετάφρασης στη συγκε-κριµένη γλώσσα, τη δυνατότητα φωνη-τικής αλληλεπίδρασης, τη δυνατότηταψηφιακής ανάλυσης κειµένου και τηδιαθεσιµότητα των σχετικών ψηφιακώνγλωσσικών πόρων/πηγών.

Οι δυνατέςΗ γλώσσα µε την καλύτερη βαθµο-

λογία στα κριτήρια είναι ασφαλώς ηΑγγλική, που απολαµβάνει τη συγκριτικάκαλύτερη τεχνολογική υποστήριξη (ανκαι όχι την καλύτερη δυνατή), γεγονόςπου διευκολύνει την περαιτέρω εξά-πλωσή της.

Ακολουθούν µε ικανοποιητική ή µέ-τρια τεχνολογική/ψηφιακή υποστήριξηη Ολλανδική, η Γαλλική, η Γερµανική,η Ιταλική και η Ισπανική. Η Ελληνική,όπως επίσης η Βασκική, η Καταλανική,η Πολωνική, η Ουγγρική κ.ά. κατα-τάσσονται στις γλώσσες µε «αποσπα-σµατική» µόνο υποστήριξη, γι’ αυτόακριβώς θεωρούνται γλώσσες υψηλούκινδύνου προς εξαφάνιση.

Δραµατικές διαφορές Σύµφωνα µε τους επιµελητές της µε-

λέτης Χανς Ουζκοράιτ και Γκέοργκ Ρεµ,«υπάρχουν δραµατικές διαφορές στηνυποστήριξη της γλωσσικής τεχνολογίαςανάµεσα στις διάφορες ευρωπαϊκέςγλώσσες και τεχνολογικές περιοχές. Τοχάσµα µεταξύ ‘µικρών’ και ‘µεγάλων’γλωσσών ολοένα και διευρύνεται. Πρέπεινα εξασφαλίσουµε τον εφοδιασµό τωνµικρότερων και λιγότερο πλούσιων -σεψηφιακούς πόρους- γλωσσών µε τιςαπαραίτητες βασικές τεχνολογίες, αλλιώςοι γλώσσες αυτές είναι καταδικασµένεςσε ψηφιακή εξαφάνιση».Ως ελπίδα αυτών των γλωσσών θεω-

ρείται η βελτίωση και η ευρύτερη αξιο-ποίηση του λογισµικού γλωσσικής τε-χνολογίας, το οποίο επιτρέπει τη φω-νητική και τη γραπτή επεξεργασία τωνδιαφόρων γλωσσών. Παραδείγµατα αυτών των δυνατοτή-

των είναι οι ηλεκτρονικοί ορθογραφικοίκαι συντακτικοί διορθωτές κειµένων,οι διαδραστικοί προσωπικοί «βοηθοί»των έξυπνων κινητών τηλεφώνων (π.χ.η Siri στο iPhone), τα συστήµατα αυ-τόµατης µετάφρασης, τα ηλεκτρονικάσυστήµατα διαλόγου των τηλεφωνικώνκέντρων, οι µηχανές αναζήτησης, ησυνθετική φωνή στα συστήµατα πλοή-γησης των αυτοκινήτων. κ.ά.

Το βασικό πρόβληµαΤο σηµαντικό, σύµφωνα µε την έκ-

θεση, είναι όλες αυτές οι δυνατότητεςνα προσφέρονται στους χρήστες και στηµητρική τους γλώσσα που κινδυνεύειµε εξαφάνιση. Χωρίς αποφασιστική δρά-ση, γίνεται η δυσοίωνη πρόβλεψη ότιοι γλώσσες αυτές δύσκολα θα επιβιώσουνστον ψηφιακό κόσµο του 21ου αιώνα.Ένα πρόβληµα είναι ότι το λογισµικό

αυτών των συστηµάτων γλωσσικής τε-χνολογίας στηρίζεται σε στατιστικές µε-θόδους που απαιτούν τεράστιες ποσό-τητες γραπτών ή φωνητικών δεδοµένων,όµως τόσα πολλά δεδοµένα είναι δύσκολονα αποκτηθούν για γλώσσες που οµι-λούνται από σχετικά λίγους ανθρώπους.Εξάλλου, ακόµα και για ευρέως χρη-

σιµοποιούµενες γλώσσες όπως τα αγ-γλικά, η σχετική γλωσσική τεχνολογίαέχει ακόµα αδυναµίες, που είναι π.χ.φανερές στις άκρως ανεπαρκείς και γε-µάτες λάθη αυτόµατες µεταφράσεις. Ηέκθεση προτείνει ότι πρέπει να αναληφθείµια συντονισµένη µεγάλης κλίµακαςπροσπάθεια στην Ευρώπη, προκειµένουσταδιακά να δηµιουργηθούν ή να βελ-τιωθούν οι αναγκαίες τεχνολογίες καινα βοηθηθούν οι γλώσσες που είναι ψη-φιακά παραγκωνισµένες.

Τη γλώσσα µού... έχασαν

Οι περισσότερες ευρωπαϊκές γλώσσες κινδυνεύουν µε ψηφιακή εξαφάνιση

Πρέπει να εξασφαλιστεί ο εφοδιασµός των µικρότερων και λιγότερο πλούσιων-σε ψηφιακούς πόρους- γλωσσών µε τις απαραίτητες βασικές τεχνολογίες

?049-ΚΟΣΜΟΣ 29/09/2012 1:41 ?Μ Page 49

Page 23: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 23

Page 24: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 24

Page 25: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 25

Page 26: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 26

Page 27: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Press Campaign: Highlights

http://www.meta-net.eu 27

Page 28: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Website: Visitors Overview

http://www.meta-net.eu 28

began sending out press release

European Day of Languages

unusually high traffic

Page 29: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Website: Visitors’ Cities

http://www.meta-net.eu 29

City with the most visits: Brussels!

Page 30: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Computational Morphology for Europe’s Languages

The State of Computational Morphology for Europe’s Languages

http://www.meta-net.eu 30

Page 31: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Computational Morphology?

q  So, what is the state of Computational Morphology support? Do we have precise, good, reliable tools for all European languages?

q  Answering this question is a non-trivial, difficult and complex task.

q  However, we can provide a rough approximation.

q  In META-NET we had a look at 30 languages (Basque, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovak, Slovene, Spanish, Swedish).

q  We gathered data on several aspects that were used to prepare a cross-language comparison, along with statistics, discussions, comparisons, experts’ opinions, etc.

http://www.meta-net.eu 31

Page 32: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Coarse-Grained View

q  We investigated four main areas: Machine Translation; Speech; Text Analytics; Language Resources.

q  Computational Morphology is covered by Text Analytics.

q  Text Analytics comprises, among others,

§  the quality and coverage of existing text analytics technologies (morphology, syntax, semantics),

§  coverage of linguistic phenomena and domains,

§  amount and variety of available applications,

§  quality and coverage of existing lexical resources and grammars.

http://www.meta-net.eu 32

Page 33: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Coarse-Grained View

http://www.meta-net.eu 33

English

good

Dutch French German Italian Spanish

moderate fragmentary

Basque Bulgarian Catalan Czech Danish Finnish Galician Greek Hungarian Norwegian Polish Portuguese Romanian Slovak Slovene Swedish

weak or no support

Croatian Estonian Icelandic Irish Latvian Lithuanian Maltese Serbian

excellent

Text

Ana

lyti

cs

Page 34: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Key Observations

http://www.meta-net.eu 34

q  When it comes to Language Technology support, there are massive differences between Europe’s languages and technology areas.

q  LT support for English is ahead of any other language.

q  Even support for English is far from being perfect.

q  The gap between English and the other languages keeps widening!

q  Several languages – Icelandic, Latvian, Lithuanian, Maltese – receive this weakest score in all four areas!

Page 35: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Simplified Methodology

http://www.meta-net.eu 35

q  Distributed data collection process in the respective countries.

q  30 tables provide data for all languages (tools, resources, gaps etc.).

q  Reduce numbers to one final score per language and area.

q  Calibration of tables across languages in smaller groups.

q  Final scores for each area and language were derived from two central features (quality, coverage), resulting in one big table:

Basque Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hungarian Icelandic Irish Italian Latvian Lithuanian Maltese Norwegian Polish Portuguese Romanian Serbian Slovak Slovene Spanish Swedish

Tokenization, Morphology (tokenization, POS tagging, morphological analysis/generation)

5 5 5 5 0 5 3,1 4,1 5 4 4 4,1 5 4 4,1 4,1 4,1 3,1 4,1 3 3,1 4,1 5 4,1 5 5 3,1 4,1 5 4,1Parsing (shallow or deep syntactic analysis) 4 4 3 2 5 3,1 2,1 4,1 3,1 3,1 4 4,1 3 2,1 4 4 2 3,1 2,1 1,1 0 3,1 4 3,1 4 3,2 0 3,1 4 4,1Sentence Semantics (WSD, argument structure, semantic roles) 3,1 2,1 2 1,2 3,1 1,1 2,1 3,1 2 2 1,1 2,1 1,1 2 1,2 1,1 0 4 0 1,1 0 3,1 1,3 3,1 4 0 0 2,2 2,1 2Text Semantics(coreferenceresolution, context, pragmatics, inference)

1 2 1,1 0 3 1 2 1,1 2 1 2,1 2,1 2,1 2 0,2 0 0 3 0 1 0 3 1,2 1,2 4,1 0 0 0 2 2,1Advanced Discourse Processing (text structure, coherence, rhetorical structure/RST, argumentative zoning, argumentation,

1 0 2 0 3 1 0 2 0 0 2 0 2,1 1 0 0 0 2 0 1 0 3 1 2 3,1 0 0 0 1 1Information Retrieval(text indexing, multimedia IR, crosslingual IR)

4 2 1,2 2,3 0 3 3 4,1 3 3 4,1 2 3 3,1 1,1 0 3,1 4,1 0 1,2 0 4 2 0 5 3 2,1 0 2 3,1Information Extraction (named entity recognition, event/relation extraction, opinion/sentiment recognition, text

3 3 1,1 3,1 4,1 3 2,1 3,1 2 2 3,1 1,2 3 3 6 1 0 4,1 3 3 0 4 2 3,1 4,1 2 1 2,1 1,1 4Language Generation (sentence generation, report generation, text generation)

0 2 1,2 0,4 4 0 2,1 2 0 2,2 2 0 2 1,1 0 0 3 0 1,2 0 0 3,1 1 0 0 0 0 0 2 2,1Summarization, Question Answering,advanced Information Access Technologies

2 2 0 0,1 3 2,1 2,1 2 2 2 3 1,1 2 1,1 0 0 0 3 0 0,1 0 3,1 2 2,2 4,1 0,1 1 1,1 2,1 1Machine Translation 3,1 2 3,1 1,2 0 1,2 2,2 2,1 2,1 3 3,1 4,1 2,1 1 5 2 2,1 3,1 4 3 2,1 2,2 3 2,1 3,1 0,1 2 3,1 4,1 2,2Speech Recognition 1 3 3 3 2,1 1,2 3,1 4 4 3 4 5 4 3,1 2,2 1,1 3,1 4,1 0 1,1 1 1,1 3,1 2,2 2,1 1 2 2,1 3,1 3,1Speech Synthesis 2,4 3 4 3,1 4 2,1 4 4,1 4 4 4 5 4,1 4,1 4 2,1 3,1 4 3,1 3 4 2,1 5,1 4 2 4 3 3,2 4 3Dialogue Management (dialogue capabilities and user modelling)

0 0 2,2 1 3,1 1 2,1 3,1 3 1,1 3 1 3,1 1,2 0 0 0 3 0 0 0 1,1 1 3 0 0 0 2,1 2 3

Reference Corpora 2,3 4,1 3,1 3,1 5 3,1 2,2 4,1 4 3,1 3,1 5 3,1 3 6 3,1 3,2 3 4,1 4 3 3 4 4,1 1,1 2,2 4,1 4,1 3,1 3,1Syntax-Corpora(treebanks, dependency banks) 2,2 2,1 3 3,1 3,3 1,3 2,2 4,2 2,1 3,2 3 2 3 3,1 5,1 2,2 1,2 3 1 1 0 3,1 4 4 4,1 0 2 3,2 2 3Semantics-Corpora 1 4,1 1 0 3,1 1,2 1,2 3 2 0 1,1 1 1,1 2,1 1,5 0 0 4 1 0 0 2,1 2,2 3,1 2,1 0 0 1,4 2 1Discourse-Corpora 0 2 2 0 2,1 1,3 0 3 2,1 2,1 2 0 2 0 0 0 0 2,2 0 0 0 1,1 1,1 2 2,1 0 1,1 0 3 1Parallel Corpora, Translation Memories 0 2,2 2,1 3 3,1 2,1 2,1 4 2,1 3 3,1 5 2 2 6 1,1 3,2 3,1 3,1 3,1 2,1 4,1 4 2,1 4,1 2,1 2 2,2 3,1 3,2Speech-Corpora (raw speech data, labelled/annotated speech data, speech dialogue data)

2,2 2,1 3,1 3 2,2 1,2 4,1 5,1 3,1 2,1 3,1 4,1 2,1 2,1 2,2 2 2,2 2,1 1 2 2,1 3,2 3 4 2,2 4 2 3,1 2,1 3Multimedia and multimodal data 5 1 2 3,1 2,2 1,2 1,3 1,1 1 2,1 1,2 2,2 1,2 2,1 1 1 1,1 3,1 0 1 0 4,1 1 0 0 1,1 2,1 0 2 1Language Models 2 2 2,1 0 4 3 2,1 5 3 2 3 4,1 3 2,1 3,1 3 0 0 3,1 3,1 3 1 1 0 4 2,1 1,2 2,2 2 4Lexicons, Terminologies 5,1 3,1 3,1 3,1 3,1 4 3,1 4,1 5 4 3,1 4,1 3,1 3 6 3 4 4,1 5 3,1 2,1 5 4 4,1 4,1 4 3,1 2,2 3 4,1Grammars 3,1 3 2 0 2,1 1,3 2,1 3 4 4 3 2 3 1 5,1 3 3 3 3,1 0 0 3,2 4 2,3 2,1 0,1 2,1 2,1 3 3Thesauri, WordNets 4 4,1 2,2 3,1 3,1 3 2,1 4,1 3,1 3,1 1,1 4 2,1 1,1 3,3 3 3,1 3,1 2,1 1 0 0 4 2,2 4 2,1 1,1 3 3 4,1Ontological Resources for World Knowledge (e.g. upper models, Linked Data)

2 3 2,1 0 2,1 1,1 0 4 0 2,1 1,1 1 2,1 2 1 0 0 3,1 1 1,1 0 0 2,2 2 2 0,1 0 0 2 1

Language Technology (Tools, Technologies, Applications)

Language Resources (Resources, Data, Knowledge Bases)

Page 36: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Simplified LR/LT Table (German)

http://www.meta-net.eu 36

0: very low 6: very high

Page 37: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Coarse-Grained View

37

English (4.50)

good

Dutch (3.94) French (3.71) German (3.36) Italian (3.50) Spanish (3.77)

moderate fragmentary

Basque (3.36) Bulgarian (2.80) Catalan (3.21) Czech (3.29) Danish (3.00) Finnish (3.64) Galician (3.43) Greek (2.71) Hungarian (3.79) Norwegian (4.36) Polish (4.07) Portuguese (3.64) Romanian (3.87) Slovak (2.43) Slovene (3.57) Swedish (4.57)

weak or no support

Croatian (2.43) Estonian (3.14) Icelandic (3.50) Irish (3.71) Latvian (3.14) Lithuanian (1.79) Maltese (0.80) Serbian (1.64)

excellent

Text

Ana

lyti

cs

In parenthesis: average scores of the grammatical analysis feature. Several additional categories and features informed and influenced the overall ranking of a language in

one of the five categories. Neither the individual scores nor the avg. scores have been calibrated with regard to the scores assigned to the LT support of other languages. These scores cannot be used for a

cross-language comparison alone; nevertheless, the avg. scores show how the authoring teams perceive the state of the grammatical analysis category for their respective language themselves.

Page 38: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

“Grammatical Analysis” Feature

Language QuanBty Availability Quality Coverage Maturity Sustainability Adaptability Average Level of support (Text AnalyBcs)

Basque 4 2.5 4 4 4 2.5 2.5 3.36 fragmentary Bulgarian 2.4 2 3.6 3.6 2.8 2.4 2.8 2.80 fragmentary Catalan 3 2.5 4 4 4 2.5 2.5 3.21 fragmentary CroaBan 2 1.5 3.5 3 2 1 4 2.43 weak/no Czech 4 2 4 4 3 2 4 3.29 fragmentary Danish 3 2 4 4 3 2 3 3.00 fragmentary Dutch 3.6 5.4 4.8 3.6 4.8 3.6 1.8 3.94 moderate English 5 5 5.5 4.5 4.5 3 4 4.50 good Estonian 2.5 3.5 3.2 2.8 4 2.5 3.5 3.14 weak/no Finnish 3.5 3.5 3.5 4 4 3.5 3.5 3.64 fragmentary French 4 4 4 4 4 3 3 3.71 moderate Galician 3 5 4 4 3 2 3 3.43 fragmentary German 4 2.5 4 4 4 2.5 2.5 3.36 moderate Greek 2 1.5 3.5 3 3 3 3 2.71 fragmentary Hungarian 4.5 2 4 4.5 4 3 4.5 3.79 fragmentary Icelandic 2 5.5 4 3 3.5 3.5 3 3.50 weak/no Irish 4 4 3 3 4 4 4 3.71 weak/no Italian 3.5 3 4 5 4 3 2 3.50 moderate Latvian 2.5 2 3 3.5 4 3 4 3.14 weak/no Lithuanian 2 1.5 2.5 2 1.5 1 2 1.79 weak/no Maltese 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.80 weak/no Norwegian 4 4.5 4 4 4.5 4.5 5 4.36 fragmentary Polish 4 4.5 4.5 4.5 4 4 3 4.07 fragmentary Portuguese 3 3 4 4 4.5 2.5 4.5 3.64 fragmentary Romanian 4 3.5 4 3.6 4.5 3.5 4 3.87 fragmentary Serbian 1 1 2.5 2 2 1.5 1.5 1.64 weak/no Slovak 2 2 3 2 2 3 3 2.43 fragmentary Slovene 2.5 4 4.5 3.5 3 3 4.5 3.57 fragmentary Spanish 3.5 3 5.4 4.5 3.5 3 3.5 3.77 moderate Swedish 4.5 3.5 5 4 5 5 5 4.57 fragmentary

http://www.meta-net.eu 38

Page 39: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Across Categories

q  The four area rankings of the 30 languages on the five point scale (from “excellent support” to “weak/no support”) take many different features and factors into account.

q  The “grammatical analysis” data are only one single piece of the puzzle – the piece that is closest to Computational Morphology.

q  Let’s have a look at the individual White Papers and the languages as they are ranked – from “good support” to “weak/no” support.

q  The following ranking is in terms of Text Analytics, the excerpts taken from the White Papers refer to morphological tools.

http://www.meta-net.eu 39

Page 40: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Good Support

q  Only language that is considered to have “good support” in terms of Text Analytics is English.

q  In comparison to certain other languages and language families, the morphology of English is usually considered as being rather simple and straight-forward.

q  Many robust and precise off-the-shelf technologies exist. q  This is most probably the main reason why the authors of the white

paper on English do not discuss morphology components at all, nor any related issues or challenges.

http://www.meta-net.eu 40

Page 41: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Moderate Support

q  Same trend in this category concerning morphological tools. q  Authors mainly discuss other research and technology gaps,

mentioning the existence of, for example, “medium- to high-quality software for basic text analysis, such as tools for morphological analysis and syntactic parsing” (German),

q  Some authors mention morphology on a more superficial level (Italian, Spanish) or not at all (Dutch).

q  The authors of the white paper on French emphasise that large programmes were set up (1994–2000; 2003–2005) to build a set of basic technologies for French, from spoken and written language resources to spoken and written language processing systems.

http://www.meta-net.eu 41

Page 42: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Fragmentary Support 1/4

q  16 languages only have fragmentary support in Text Analytics. q  The respective authoring teams report the existence of one or two

morphological tools per language. q  Clear tendency: these tools have limited functionality and a long

history including an unclear copyright situation (Hungarian). q  Neither freely nor immediately available (Danish, Romanian). q  However, these tools are usually employed in the large office suites

(MS Office, Open Office), localisation frameworks or national search engines (Norwegian, Czech, Slovak).

http://www.meta-net.eu 42

Page 43: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Fragmentary Support 2/4

q  Key contributing factor that only few morphological components exist: rich morphological systems; high degree of inflection; lack of morphological distinction for certain nominal cases.

q  These linguistic properties make morphological processing, as well as all approaches based primarily on statistics, a challenge (Basque, Polish, Slovene and other languages).

q  Special characters and encoding systems are mentioned for languages with alphabets that go beyond plain ASCII: processing words when diacritics are missing (web, email) is a challenge. Experts demand more robust error detection algorithms (Czech).

q  Important observation (Basque, Greek): algorithms and approaches developed for English cannot be directly transferred to other languages.

http://www.meta-net.eu 43

Page 44: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Fragmentary Support 3/4

q  Languages spoken in smaller countries usually do not receive as much attention and research funding as larger languages in which typically also a larger base of researchers works on building actual technologies, maybe even breaking new ground (Greek).

q  Hungarian: a lack of synchronisation between parallel efforts to build morphological processors lead to substantial friction loss. This is why several morphological parsers for Hungarian exist but they use conflicting and incompatible formalisms.

q  Some authors discuss related technologies such as, for example, e-learning tools and systems for second language learners that employ complex morphological components (Czech).

http://www.meta-net.eu 44

Page 45: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Fragmentary Support 4/4

q  Portugal set up a project in 2005 to enable the development of a set of linguistic resources and components to support the processing of Portuguese. Outcome: large corpus and tools for tokenisation, morphosyntactic tagging, inflection analysis, and lemmatisation.

q  Slovakia set up a project to provide processing of Slovak for linguistic research purposes within the National Research and Development Programme. Outcome: tools and data sets that include processors and morphologically annotated corpora.

q  French (1994-2000) had a clear head-start over Portuguese and Slovak in addition to a longer, more established research tradition in this area, which is why it was ranked higher.

q  The Slovak experts conclude that, while certain morphological tools do exist, “those must be further developed and supported.”

http://www.meta-net.eu 45

Page 46: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Weak or No Support 1/2

q  This category concerns eight languages. q  A small or very small number of morphological tools or components

exist (Irish) and are used, even in well known applications, but they are neither freely available nor accessible for research purposes.

q  Tools are based on very simple approaches that rely on word lists (Lithuanian, Estonian, Croatian).

q  Several of these tools have been in development since the 1980ies and are under the control of companies. Researchers often use ispell or aspell (open source) as a technological fallback solution.

q  The complex morphology of languages is mentioned in almost all cases along with the statement that morphology processing must be further developed (Icelandic, Estonian, Croatian, Maltese, Serbian).

http://www.meta-net.eu 46

Page 47: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Weak or No Support 2/2

q  Authors demand more development for basic morphological tools. q  Perceived as very important: to design and model approaches to the

specific linguistic properties of a language without trying to adapt an approach developed for English (Serbian, Estonian).

q  One such step is to set up specific language technology programmes, as has been done, among others, in France, Slovakia and Portugal.

q  In 2000, the Icelandic government set up a national programme with the aim of supporting institutions and companies in creating resources for Icelandic. Outcome: several projects, huge impact on the national field. Among its results are a full-form morphological database of Modern Icelandic inflections, a balanced morphosyntactically tagged corpus and a training model for data-driven POS taggers and an improved spell checker.

http://www.meta-net.eu 47

Page 48: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Summary

q  Solid computational morphology tools only exist for a handful of European languages – i.e., those with many speakers (and funding).

q  The smaller the language, the less tools exist. q  “Fragmentary” or “weak/no support”: 24 of the 30 languages – very few

tools; very limited functionality; availability is a problem. q  In terms of the full NLP stack, computational morphology cannot be

taken for granted, it is by no means a “solved problem”. q  More original research off the beaten track (i.e., English) needed.

q  More coordination, synergies and research transfer between the languages needed.

q  France, Iceland, Portugal, Slovakia show that large, dedicated funding programmes are needed to support the development of basic LRs/LTs.

http://www.meta-net.eu 48

Page 49: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Strategic Research Agenda The META-NET Strategic Research Agenda for Multilingual Europe

http://www.meta-net.eu 49

Page 50: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Three Ingredients

50

Appropriate Programme

Vision & Agenda

Appropriate Actors

Research & Commercialisation

Appropriate Support

Funding

http://www.meta-net.eu

Page 51: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Three Vision Groups

http://www.meta-net.eu 51

q  Translation and Localisation (technical documentation, official bulletins, GUI localisation, games, services etc.) §  Target stakeholders: large users of translation services, (machine)

translation, software companies, game companies, localisation industry

q  Media and Information Services (audiovisual sector, news, digital libraries, portals, search engines etc.) §  Target stakeholders: media industries, search engine providers, archives

q  Interactive Systems (mobile assistance, dialogue translation, call centres, etc.) §  Target stakeholders: mobile software and service providers, telecom

industry, call centres

Page 52: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

52 http://www.meta-net.eu

Vision Group Meetings

q  Vision Group Translation and Localisation §  July 23, 2010 Berlin, Germany §  September 28, 2010 Brussels, Belgium §  April 7/8, 2011 Prague, Czech Republic

q  Vision Group Media and Information Services §  September 10, 2010 Paris, France §  October 15, 2010 Barcelona, Spain §  April 1, 2011 Vienna, Austria

q  Vision Group Interactive Systems §  September 10, 2010 Paris, France §  October 5, 2010 Prague, Czech Republic §  March 28, 2011 Rotterdam, The Netherlands

Page 53: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Vision Paper

Vision Group Translation and

Localisation Report

Vision Group Interactive

Systems Report

Vision Group Media and

Information Services Report

Priority Themes Paper

Expert meeting minutes

Expert meeting minutes

Expert meeting minutes

Planning Process

Strategic Research Agenda

2010 2011 2012

Page 54: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Vision Paper

Vision Group Translation and

Localisation Report

Vision Group Interactive

Systems Report

Vision Group Media and

Information Services Report

Priority Themes Paper

Expert meeting minutes

Expert meeting minutes

Expert meeting minutes

Planning Process: Documents

Strategic Research Agenda

2010 2011 2012

www.meta-net.eu [email protected] T: +49 30 23895 1833

The Future European Multilingual Information Society

Vision Paper for a Strategic Research Agenda

“People can’t share knowledge if they don’t speak a common language.” Davenport, Thomas H, and Laurence Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School, Boston, 1997, p. 98.

Join the discussion at www.meta-et.eu/forum

LT 2020 Vision and Priority Themes for Language Technology Research in Europe until the Year 2020 Towards the META-NET Strategic Research Agenda

The development of this paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the Euro-pean Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).

Do you have comments, ideas or suggestions

with regard to the content of this document?

Please send them to [email protected] or

discuss them online: http://www.meta-net.eu/sra.

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Translation and Localisation Results of first two meetings

Editors: Aljoscha Burchardt, Georg Rehm

Dissemination Level: Public

Date: 3 December 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Media and Information Services: Results of first two meetings

Editors: Maria Koutsombogera, Stelios Piperidis

Dissemination Level: Public

Date: 10 November 2010

This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.

A Network of Excellence forging the

Multilingual Europe Technology Alliance

Vision Document

Vision Group Interactive Systems: Results of first two meetings

Editors: Joseph Mariani, Bernardo Magnini

Dissemination Level: Public

Date: 28 December 2010

Page 55: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Preparation of the SRA

q  Strategic Research Agendas of other initiatives were screened. q  Many suggestions as input from Vision Group members. q  We discussed procedures, input and structure of the SRA in four

meetings of the META Technology Council. §  Brussels, Belgium, November 16, 2010 §  Venice, Italy, May 25, 2011 §  Berlin, Germany, September 30, 2011 §  Brussels, Belgium, June 19, 2012

q  Additional input in talks, meetings, workshops, discussions, etc. §  Example: Three HLT Expert Meetings organised by the EC (end of 2011)

q  Almost 200 experts contributed to the SRA (54% from industry; 46% from research; 4% from national/international institutions).

http://www.meta-net.eu 55

Page 56: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Strategic Research Agenda

http://www.meta-net.eu 56

q  Addresses the problems we identified when preparing the white papers.

q  Three priority research themes and application/innovation scenarios.

q  Can put Europe ahead of its competitors in this technology area.

q  >190 contributors; >2 years.

q  Presented and discussed at 83 conferences and major workshops.

q  Final version ready on Dec. 1, 2012.

q  http://www.meta-net.eu/sra

Page 57: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

SRA: Contents – Brief Glimpse

http://www.meta-net.eu 57

q  Set the stage and describe the Euro-pean situation, the needs and the LT research and industry.

q  Discuss the state of IT, predictions and mega-trends.

q  Our technology vision for 2020.

q  Select and specify priority themes.

q  Suggest a model for speeding up innovation.

q  Outline proposals for the organisation of research and innovation.

Page 58: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Priority Themes: 3 + 2

q  We decided on priority themes that (a) support technology progress, (b) lead to solutions that European society needs and (c) solutions from which European industry will benefit as users or as providers.

§  Translingual Cloud §  Social Intelligence and e-Participation

§  Socially-Aware Interactive Assistants

q  Two additional themes:

§  European Service Platform for Language Technologies

§  Core Technologies for Language Analysis and Production

http://www.meta-net.eu 58

Page 59: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

PT1: Translingual Cloud

q  Europe has a big need for translations of publishable quality. q  Focus on high-quality translation. q  New research paradigms

§  Inclusion of professional translators into the research process

§  Inclusion of technologists into research on human translation processes

q  Different technological approaches §  Stronger emphasis on the properties of

individual languages §  A central role for semantics

q  Methods for specific genres & domains

http://www.meta-net.eu 59

Page 60: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Priority Research Theme 1: Translingual Cloud

Anydevice

Target groups: European citizen, language professional, organisations, companies, European

institutions, software applications

Multiple target formats

Single accesspoint

Automatic translation and interpretation

Language checking Post-editing Workbenches for creative

translations Novel translation and authoring

workflows

Quality assurance Computer-supported human

translation Multilingual content production and

text authoring Trusted service centre (privacy,

confidentiality, security of source data)

Services and Technologies:

Crosslingual communication, translation and search

Real-time subtitling, voice-over generation and translating speech from live events

Mobile interactive interpretation

Multilingual content production (media, web, technical, legal documents)

Showcases: translingual spaces for ambient translation

Applications:

Written (twitter, blog, article, newspaper,text with/without metadata etc.) orspoken input (spontaneous spoken

language, video/audio, multiple speakers)

Modular combination of analysis, transfer

and generation models

From very fast but lower quality to slower but very

high quality (including instant quality upgrades)

Exploiting strong monolingual analysis

and generation methods and resources

Multiple target formats

Domain, task and genre specialisation

models

Extending translation with

semantic data and linked open data

Page 61: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

PT2: Social Intelligence

q  Better decisions by monitoring social media q  Inclusion of citizens into collective decision processes q  Opinion formation, consensus building, decision making q  Evolution of new solutions q  New forms of democracy: e-democracy,

massive participation, transparency q  Dialogues and debates across language

boundaries and across parties, political alliances, social classes

q  Better than binary voting q  Documented transparent

decision processes

http://www.meta-net.eu 61

Page 62: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Priority Research Theme 2: Social Intelligence and e-Participation

From shallow to deep, from coarse-grained to

detailed processing techniques

Making language technologies interoperable

with knowledge representa-tion and the semantic web

“Semantification” of the web: tight integration with the Semantic Web and Linked Open Data

Mapping large, heterogeneous, unstructured volumes of online content to structured, actionable

representations

Unleashing social intelligence by detecting and monitoring opinions,

demands, needs and problems

Target groups: European citizen, European institutions, discussion

participants, companies

Make use of the wisdom of the

crowds

Improved efficiency and

quality of decision processes

Understanding influence diffusion across social media

especially social media, comments, blogs, forums

decision-relevant information

support

sentiment analysis and opinion mining including the temporal dimension)

cues

from arbitrary online content

visualising discussions and opinion statements

Services and Technologies:

collective deliberation and e-participation

-wide deliberation on pressing issues

and processes; modeling evolution of opinions

analysis technologies

Applications:

Page 63: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Priority Research Theme 3: Socially-Aware Interactive Assistants

Interacting naturally

with and in groups

Learning and

forgetting information

Adaptable to the user’s needs and preferences and the environment

Include human-computer, human-artificial agent and

computer-mediated human-human communication

Proactive, self-aware,

user-adaptable

Interacts naturally with humans, in any

language and modality

Can be personalised to individual communication

abilities including special needs

Can learn incrementally from all interactions and

other sources of information

recognition

and synthesis, providing expressive voices

understanding

incremental conversational speech

models of human communication

inter-dependencies

priority themes

Services and Technologies:

Applications:

dialogue systems

environment

modalities (visual, tactile, haptic) verbal/non-verbal behaviour, social context

ments, any

vocabulary

recovery,self-

assessment

Multilingualcapabilities

Page 64: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Providers of operational and research technologies and services

ResearchCentres

EuropeanInstitutions

Othercompanies (SMEs,

startups etc.)

NationalLanguageInstitutions

LanguageTechnologyProviders

LanguageService

ProvidersUniversities

EuropeanInstitutions

ResearchCentres

Public Administrations Enterprises LT User

Industries UniversitiesEuropeanCitizens

Beneficiaries/users of the platform

Interfaces (web, speech, mobile etc.)

Priority Research Theme 1:Translingual

Cloud

Priority Research Theme 2:Social Intelligence& e-Participation

Priority Research Theme 3:Socially Aware

Interactive Assistants

European Service Platform for Language Technologies(Cloud or Sky Computing Platform)

Multilingualtechnologies

Textanalytics

Textgeneration

Languagechecking

Sentimentanalysis

Named entityrecognition

Summari-sation

Knowledge accessand management

Information andrelation extraction

LanguageProcessing

LanguageUnderstanding

Knowledge

Emotion/Sentiment

Data protectionToolsData SetsResourcesComponentsMetadataStandardsInterfacesAPIsCataloguesQuality AssuranceData Import/ExportInput/OutputStoragePerformanceAvailabilityScalability

Featu

res

Page 65: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Core Resources & Technologies

Icelandic

French

CatalanItalian

Maltese

Greek

Bulgarian

Romanian

Serbian

Croatian

Slovene Hungarian

Slovak

Czech

German

Danish Lithuanian

Latvian

Estonian

Finnish

Swedish

Norwegian

Basque

SpanishPortuguese

Galician

English

Irish

PolishDutch

Polish

English

Irish

Icelandic

Italian

Maltese

Greek

Bulgarian

Romanian

SerbianCroatian

SloveneHungarian

Slovak

Czech

German

Dutch

DanishLithuanian

Latvian

Estonian

Finnish

Swedish

Norwegian

Basque

Spanish

Portuguese

Galician

French

Catalan

http://www.meta-net.eu 65

Page 66: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Conclusions and Next Steps META-NET

http://www.meta-net.eu 66

Page 67: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Conclusions and Next Steps

q  The White Paper Series clearly shows that Computational Morphology cannot and must not be considered a “solved problem”.

q  Quite the contrary: several good technologies exist only for a small number of languages; many languages lack adequate support.

q  The research community needs to team up to discuss synergies and to boost research and technology transfer between its languages.

q  The goal should be adequate, precise, robust, scalable and freely available morphology components for all European languages.

q  New challenges and opportunities: real-time processing, web-scale processing of and training on documents using big data technologies such as Hadoop, interoperability and standardisation of data formats, morphology as a service etc.

q  The sophisticated applications foreseen in our META-NET SRA are critically dependent on reliable and precise basic processing components — including computational morphology!

http://www.meta-net.eu 67

Page 68: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Conclusions and Next Steps

q  Europe is extremely interested in and passionate about its languages. q  Our Strategic Research Agenda for LT research and innovation can put

Europe ahead of its competitors in this technology area. q  Provides useful and attractive solutions to European society, at the same

time creating huge business opportunities for European industry. q  Now is the time to move forward with a continent-wide, systematic push

and to invest in strategic research. A modest investment is required. q  We are very confident that we can help build applications that break

down language barriers in Europe and beyond. q  This push will generate a countless number of opportunities. q  This year is important: H2020 and CEF can provide sufficient resources

to make our visions for Europe’s citizens and economy a reality. q  META-FORUM 2013, September 19/20, Berlin, Germany.

http://www.meta-net.eu 68

Page 69: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

META-FORUM 2013 — Connecting Europe for New Horizons is an international conference on powerful language technologies for the multilingual information society, the data value chain and the information market place. The two special themes of this year's edition of the conference are Big Data Text Analytics and Multilingual Web Services for Multilingual Europe.

HighlightsKeynote lectures by Daniel Marcu (Chief Science Offi cer, SDL) andWolfgang Wahlster (CEO, German Research Center for Artifi cial Intelligence, DFKI)Horizon 2020 and Connecting Europe Facility (CEF): Current State of PlayDynamic Discussions on:

Technologies for the Multilingual WebMT for ProfessionalsServices for Multilingual EuropeNeeds of Europe's LanguagesConnecting Towards New HorizonsQuality Translation and Innovation

New Stakeholders: GALA (Globalization and Localization Association); NPLD (Network to Promote Linguistic Diversity); Council of Europe Committee of Experts on the Charter of Regional and Minority LanguagesTowards a European Language Technology PlatformPanel discussionsAwards Ceremony: META Prize and META Seal of RecognitionMETA Exhibition (industry and research exhibition – software demos and posters)

Connecting Europe for New Horizons

Economics and Technology — Berlin, Germany

http://www.meta-forum.eu

Register now!

http://www.meta-net.eu

META-FORUM 2013 will be held jointly by META-NET and the German Federal Ministry of Economics and Technology, co-organised with MultilingualWeb-LT, QTLaunchPad and LT Berlin.

Vision GroupTranslation and Localisation

Vision GroupInteractive Systems

Vision GroupMedia and Information Services

StrategicResearchAgenda

META-NET Website

Language White Paper Series

ConneDeliverin

2014-2020TransportEnergyConnect

http://www.meta-net.eu

2010

2011

2012

2013

Horizon 2020

Page 70: Computational Morphology and the META-NET Strategic Research Agenda for Multilingual Europe 2020

Thank you!

META-FORUM 2013 September 19/20, Berlin http://www.meta-forum.eu

http://www.meta-net.eu http://www.facebook.com/META.Alliance

70

Q/A

META-FORUM 2013 — Connecting Europe for New Horizons is an international conference on powerful language technologies for the multilingual information society, the data value chain and the information market place. The two special themes of this year's edition of the conference are Big Data Text Analytics and Multilingual Web Services for Multilingual Europe.

HighlightsKeynote lectures by Daniel Marcu (Chief Science Offi cer, SDL) andWolfgang Wahlster (CEO, German Research Center for Artifi cial Intelligence, DFKI)Horizon 2020 and Connecting Europe Facility (CEF): Current State of PlayDynamic Discussions on:

Technologies for the Multilingual WebMT for ProfessionalsServices for Multilingual EuropeNeeds of Europe's LanguagesConnecting Towards New HorizonsQuality Translation and Innovation

New Stakeholders: GALA (Globalization and Localization Association); NPLD (Network to Promote Linguistic Diversity); Council of Europe Committee of Experts on the Charter of Regional and Minority LanguagesTowards a European Language Technology PlatformPanel discussionsAwards Ceremony: META Prize and META Seal of RecognitionMETA Exhibition (industry and research exhibition – software demos and posters)

Connecting Europe for New Horizons

Economics and Technology — Berlin, Germany

http://www.meta-forum.eu

Register now!

http://www.meta-net.eu

META-FORUM 2013 will be held jointly by META-NET and the German Federal Ministry of Economics and Technology, co-organised with MultilingualWeb-LT, QTLaunchPad and LT Berlin.

Acknowledgements: This work would not have been possible without the dedication and commitment of our colleagues Aljoscha Burchardt, Kathrin Eichler, Tina Klüwer, Arle Lommel, Felix Sasaki and Hans Uszkoreit (all DFKI), the 60 member organisations of the META-NET network of excellence, the ca. 70 members of the Vision Groups, the ca. 30 members of the META Technology Council, the more than 200 authors of and contributors to the META-NET Language White Paper Series and the ca. 200 representatives from industry and research who contributed to the META-NET Strategic Research Agenda.