View
86
Download
1
Category
Preview:
Citation preview
WP3:Linguis,cs–TextSjefBarbiers
PartnersINT,MI,RU,RUG,UU,VU
Today’sdemo’s
Integra,ngDiachronousConceptualLexiconsthroughLinkedOpenData–VU
GrETEL,PaQU:SearchingTreeBanks–RUG/UU/Leuven
MIMOREinNederlab:Morphosyntac,cdialectresearch–Meertens
Integra,ngDiachronousConceptualLexiconsthroughLinkedOpenData
IsaMaks1,MariekevanErp1,PiekVossen1,RinkeHoekstra,NicolinevanderSijs1:Fac.ofHumani,es,VrijeUniversiteitAmsterdam(WP3)2:ComputerScienceDepartment,VrijeUniversiteitAmsterdam(WP4)3:MeertensIns,tuutAmsterdam(WP2)
Integra(onandenrichmentofseveralexis(nghistoricalconceptuallexicons,matchingtheontologies,usinglinkedopendataprinciples.Enables:• tracingchangesinwordmeaningsandconceptsover(me• queryexpansion• naturallanguageprocessingofhistoricalDutchtexts.
xsd:string
ontolex:LexicalEntry
rdfs:label
penn:Tag ontolex:LexicalSense
ontolex:Form
olia:hasTag ontolex:sense
ontolex:canonicalForm ontolex:Formontolex:otherForm
lemon-cltl:Usage
xsd:date
xsd:date
lemon:SenseDefinition
ontolex:LexicalConcept
ontolex:definition
ontolex:isSenseOf
lemon-cltl:periodEnd
ontolex:usage
xsd:string
skos:prefLabel
skos:Concept
skos:related
lemon-cltl:periodStart
dbo:Place
lemon-cltl:geographicArea
dbo:Thing
dct:subject
skos:concept
is a
ontolex:reference
lemon-cltl:SpatioTemporalScope
lemon-cltl:scope
lexinfo:Registerlexinfo:register
Prefixes:ontolex: http://www.w3.org/ns/lemon/ontolex#lexinfo: http://www.lexinfo.net/ontology/2.0/lexinfo#penn: http://purl.org/olia/penn.owl#olia: http://purl.org/olia/olia.owl#xsd: http://www.w3.org/2001/XMLSchema#skos: http://www.w3.org/2004/02/skos#dct: http://purl.org/dc/terms/dbo: http://dbpedia.org/ontology/lemon:cltl: additional modeling (in progress)
Ontologyorclassifica,onWhichconcept?Isitaplant,anoccupa,on,anemo,on,etc.?
WordsWhichwordscanexpresstheseconcepts?part-of-speech,formvariants,spellingvariants?
WhenInwhichperiodarethesewordsused?WhereInwhichpartoftheNetherlandsorBelgiumisthiswordused?
ProvenanceWhichsourceprovidedtheinforma,on?
Modellingthelexiconsaslinkedopendata
Resources
1600 EmbodiedEmo,onsh`ps://www.esciencecenter.nl/project/from-sen,ment-mining-to-mining-embodied-emo,ons
emo,ons
1650 Meijers MeijersWoordenschat(1669) alldomains
1800 HISCO h`p://historyofwork.iisg.nl/ occupa,on
1850 Brouwers BrouwersThesaurus(1987) alldomains
1885 Pland h`ps://www.meertens.knaw.nl/pland/ plants
1950 ODWN h`p://www.cltl.nl/results/demos/open-source-dutch-wordnet/ alldomains
otherresourceswillbeaddedinthefuture
QueryexpansionFindingoccupa(onsinhistorictexts
‘smallfarmers’
EnvandeschamelheidzijnerplaggenhaderdeheikeuternogeerstdenlangenwegtegaantotdeburgersvanVenlo,eerhijdewinstvanzijnarbeidingeruildzagtegen'tnoodigevooreenschraalbestaan.(FelixRu`en,1918,OnsmooieLimburg,DBNL)
Hisco[occupa7on-65111-small
farming]kleinboer
kleinlandbouwerkeuterboer
…........
Brouwers[concept?]keuterboerheikeuterlandbouwer….........
GrETEL,PaQUGertjanvanNoord(RUG)–JanOdijk(UU)
• Webapplica,ons:searchintreebanks– Treebank=textinwhicheachsentencehasasyntac,cparse
• Withinterfacesdesignedforlinguists
• Enablessyntac,cresearch• Applica,onslanguage-independentbutneedlanguage-specificcomponents– PaQu,GrETEL:Dutchonly– PolyGrETEL:mul,plelanguages
DevelopmentPlan&Status
GrETEL PaQU
Base CLARIN-NL CLARIN-NL
OwnCorpus CLARIAH CLARIN-NL
Metadata CLARIAH CLARIAHAnalysisComponent CLARIAH CLARIN-NL+CLARIAH
Moreformats CLARIAH CLARIAH
Interface CLARIAH CLARIN-NL
MoreCorpora CLARIAH CLARIAH
GREEN=done ORANGE=par,al RED=TODO
ResearchDone
• PhDonverbclusteringinDutch:Augus,nus(2015):• acquisi,onofthewordszeer,heel,erg(`very’):Odijk(2015,2016)
• norma,veandnon-norma,vevariantsof12Dutchconstruc,ons:Odijk(2015),vanNoord&Odijk(2016)
• agreementincopularconstruc,ons:VanEyndeetal.(2016)
Hun–Zij/Zeassubject
Permillionwords
WriQen Spoken
hun
0 20
zij(mv) 343 360ze(mv) 1481 4107
9
• Hunveryrare,onlyinspokencorpus,onlyinNL,onlyinunpreparedspeech(a-i)
Hem/’m–Hij/ieassubject
• ‘mrare,onlyinspokencorpus,onlyinFlanders,onlyinunpreparedspeech(a-h)
Permillionwords
WriQen Spoken
hem/‘m 0 101hij 2703 2686ie 55 1919
10
ReferencesL.Augus,nus(2015):ComplementRaisingandClusterForma,oninDutch:Atreebank-supportedinves,ga,on,PhDKULeuven,Belgium.
Odijk,J.(2015)'Linguis,cResearchwithPaQU'Computa(onalLinguis(csinTheNetherlandsjournal5,p.3-14[pdf]
Odijk,J.(2016)‘AUseCaseforLinguis,cResearchonDutchwithCLARIN’,inK.DeSmedt(ed.),SelectedPapersfromtheCLARINAnnualConference2015,45-61.[AbstractandFulltext]
Odijk,J.(2015),'ZoekennaarConstruc,es',presenta,onandposterheldattheDRONGOLanguageFes,val,Utrecht,26September2015.[presenta,on][poster]
Noord,G.van&J.Odijk(2016).`GoedofFout:Watgebruiktmenfeitelijk?’,presenta,onatthe`GroteTaaldag'(TIN),Utrecht,6February2016.[handout][pptx][pdf]
VanEynde,F,L.Augus,nus&V.Vandeghinste(2016).'Numberagreementincopularconstruc,ons:Atreebank-basedinves,ga,on'.doi:10.1016/j.lingua.2016.02.001toappearinLingua.[URL]
UpgradeMIMOREMarcKempsSnijders-SjefBarbiers(Meertens)
• Morphosyntac,cresearchtoolforthreeDutchdialectdatabases(CLARIN)
• Integra,onintoNederlab:Interface-MTAS• Integra,onoftheSANDmapsfromSANDVolumesIandII• Workspacewithopera,onsonvirtualcollec,ons
SearchforsubjectdoublingintheSyntac7cAtlasoftheDutchDialects(SAND)
Showthedataunderlyingthemapofsubjectdoubling2.singular
Saveascorpus
Searchforapoten7allycorrela7ngpropertyinadifferentdatabase(DIDDD):ar7cle-demonstra7vesequences
POStagspecifica7onofSearchD(art,def)followedbyD(dem,def)
Resultlistwith(a.o.).KWICconcordanceandPOStags
Saveascorpus
Saveddatasetsinworkspace
Geographicdistribu7onofthetwophenomena
Recommended