16
CLTL Software and Web Services Rubén Izquierdo Beviá

CLTL: Description of web services and sofware. Nijmegen 2013

Embed Size (px)

DESCRIPTION

CLTL: Description of web services and sofware. Nijmegen 2013

Citation preview

Page 1: CLTL: Description of web services and sofware. Nijmegen 2013

CLTL Software and Web

ServicesRubén Izquierdo Beviá

Page 2: CLTL: Description of web services and sofware. Nijmegen 2013

Rubén Izquierdo BeviáAbout me

5-year degree on Computer Science (University of Alicante, Alicante, Spain)

National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)

Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)

Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)

Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)

Page 3: CLTL: Description of web services and sofware. Nijmegen 2013

CLTL softwareIn general common input/output format

KAFNAF, as an extension of KAF

Single components performing single tasks Integration of existing modules

Adaptation of input/output formats

Development of new ones

Page 4: CLTL: Description of web services and sofware. Nijmegen 2013

KAFKyoto Annotation Format

Stand-off, layered, XML-based representation formatDifferent types of information are stored in

different layersLayers are linked by means of references Suitable for creating pipelines based on this formatLayers:

Text tokensTerm lemmas, part-of-speech, term sentiment, word

sensesEntities, chunks, opinions…

Page 5: CLTL: Description of web services and sofware. Nijmegen 2013

KAFKyoto Annotation Format

Page 6: CLTL: Description of web services and sofware. Nijmegen 2013

NAFNewsReader Annotation FormatExtension of KAF

Allow the cross-document processingEvent coreference

ID’s are converted into valid URI’s

Store the same type of information provided by different toolsResult of two different pos-taggers

Page 7: CLTL: Description of web services and sofware. Nijmegen 2013

How the software is provided I

All modules are publicly available on GitHubCLTL GitHub

http://github.com/cltl

NewsReader GitHubhttp://github.com/newsreader

OpeNER GitHubhttp://github.com/opener-project/

Page 8: CLTL: Description of web services and sofware. Nijmegen 2013

How the software is provided II

Some are available as Web ServicesExposed as REST web services

Accept and input stream (KAF/NAF)

Generate an output stream (KAF/NAF)

Easy to call from command line with CURLEasy to create module pipelines in the same way you

create a linux commands pipeline

http://wordpress.let.vupr.nl/web-services/

Page 9: CLTL: Description of web services and sofware. Nijmegen 2013

How the software is provided II

Page 10: CLTL: Description of web services and sofware. Nijmegen 2013

How the software is provided II

Page 11: CLTL: Description of web services and sofware. Nijmegen 2013

Our software IGeneral modules (integrated)

Tokenizers: whitespace based, open-nlp trained...

Sentence splitters: based on rules, open-nlp

Pos-taggers: treetagger, open-nlp pos taggers

Chunker: trained on Alpino data with open-nlp

Parsers: Alpino (nl), Stanford (en)

Page 12: CLTL: Description of web services and sofware. Nijmegen 2013

Our software II General modules (developed by us)

Wordnet Tools Functions to use a WordNet in LMF format

Word Sense Disambiguation systems UKB: unsupersived SVM: supervised (for nl derived from DutchSemcor)

Multiword tagger multiword sequences of terms according the WordNet

OntoTagger Ontotagger inserts (semantic) labels into KAF representation on

the basis of lemma or wordnet synset representations of text

Page 13: CLTL: Description of web services and sofware. Nijmegen 2013

Our software IIIGeneral modules (developed by us)

Named Entity RecognizerDetects dates and locations using specific resources

+ GeoNames

KyBotExtract tuples and relations from a set of profiles

formulated using semantic and structural properties

Page 14: CLTL: Description of web services and sofware. Nijmegen 2013

Our software IV OpeNER related (developed by us)

Hotel property taggerDetect aspects related with cleanliness, staff,

breakfast, rooms…Term polarity tagger

Positive/negative terms, intensifiers, negators …Opinion miner

Detect opinions: target + holder + expression2 rule based version // 1 machine learning version

Page 15: CLTL: Description of web services and sofware. Nijmegen 2013

Our software VNewsReader related (developed by us)

Discourse ModuleSplits incoming texts into headers and paragraphs

Factuality ClassifierClassifies whether a statement is

factual/probable/possible or not Event Coreference

Compares descriptions of events within and across documents to decide if they refer to the same events.

Page 16: CLTL: Description of web services and sofware. Nijmegen 2013

CLTL Software and Web

ServicesRubén Izquierdo Beviá