Upload
hilda-malone
View
214
Download
2
Tags:
Embed Size (px)
Citation preview
• Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the– acquiring, processing and producing Language
Resources required by MT and other Language Technologies
Objectives
Partners
• WP1 – Management (UPF)• WP3 – The Platform (UPF)
• WP4 – Corpus Acquisition & Annotation (ILSP)• WP5 – Parallel corpus & derivatives (DCU)• WP6 – Lexical Acquisition (UCAM)
• WP7 – Integration & resource evaluation (ILC)• WP8 – Evaluation in industrial environment (LT)• WP2 – Dissemination and Exploitation (ELDA)
Platform
• The PANACEA platform is an interoperability space based on tools, guidelines, a Common Interface definition, and a “Travelling Object” specification
• Tools: Taverna, BioCatalogue, myExperiment, Soaplab
• Common Interface: WS interoperability• Travelling Object: XCES and GrAF • Documentation (video tutorials, how-tos, deliverables, etc.
at http://www.panacea-lr.eu)
4
Tools
SOAPLAB 2 (SOAP)
SOAPLAB 2 (SOAP)
- Web application for deploying command line tools as WS- No coding needed! Metadata only- Services deployed by ILSP at http://nlp.ilsp.gr/ws/
- Web application for deploying command line tools as WS- No coding needed! Metadata only- Services deployed by ILSP at http://nlp.ilsp.gr/ws/
TAVERNATAVERNA- Open source desktop application - Imports Soaplab and other types of WS- Allows for combination of WS in workflows (http://www.taverna.org.uk/)
- Open source desktop application - Imports Soaplab and other types of WS- Allows for combination of WS in workflows (http://www.taverna.org.uk/)
BioCatalogue
BioCatalogue
-Web application for registering and documenting WSs http://registry.elda.org -Search function- Auto-checks web services status- Annotations: tags, categories, etc.
-Web application for registering and documenting WSs http://registry.elda.org -Search function- Auto-checks web services status- Annotations: tags, categories, etc.
Web Services
Web Services
Workflow editor
Workflow editor
RegistryRegistry
Social networkSocial
networkmyExperimen
tmyExperimen
t
- Share workflows, files, data, etc.- Share opinions and comments, create work groups, etc. http://myexperiment.elda.org
- Share workflows, files, data, etc.- Share opinions and comments, create work groups, etc. http://myexperiment.elda.org
5
• Three levels of interoperability:– COMMUNICATION PROTOCOLS: Soap, Rest– DATA
– PARAMETERS
Interoperability
Tool B does not “understand” format N!All tools understand the previous format
Tool A
Tool B
ABCD
ABCD
Tool A
Tool B
YTQZ
ABCD
6
Travelling Object • The Travelling Object (TO) is the common data and metadata
format used in PANACEA to make components understand each other (syntactic interoperability)
• First TO for annotations up to tagging and lemmatization– Based on XCES (XML files with p, s, and t elements)– Tools: formatConverters and stylesheets
• Second TO for everything else (NER, DepParsing, etc.)– Based on GrAF (standoff annotation)– One file for primary data– One file for each annotation layer
7
Common Interface• A Common Interface (CI) defines the mandatory
parameters for every type of WS:
http://panacea-lr.eu/en/info-for-professionals/documents/http://registry.elda.org 8
Soaplab Web Services
• 28 Corpus Acquisition and Annotation Web Services• NLP WS’s focusing on sentence splitting,
tokenization, tagging, lemmatization and parsing, e.g:– EN, FR: Berkeley tagger and parser (DCU)– ES: UPF tools, Freeling; IT: ILC’s DESR, Freeling – DE and EL: LT’s and ILSP’s in-house tools
• WS’s for conversion from and to PANACEA’s Travelling Object (@UPF and ILC)
• WS’s for alignment of parallel data (@DCU)
10
Corpus Acquisition WS• Focused Bilingual Crawler (FBC)
– Documentation: http://registry.elda.org/services/127 – Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_bilingual_crawl_row – Sample topic definition for crawling EN-FR pages in the Environment domain
http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_topics/ENV_EN_FR_topic.txt
– Seed URL for crawling EN-FR ENV data http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_EN_FR_greenfacts.txt
• Focused Monolingual Crawler (FMC)– Documentation: http://registry.elda.org/services/160 – Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_fmc_row – Topic definition for crawling EN ENV data
http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_topics/ENV_EN_topic.txt
– List of seed URLs for crawling EN ENV http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_seeds/ENV_EN_seeds.txt