22
TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach @openminted_eu An EU infrastructure Project

The Breakdown: What is OpenMinTeD?

Embed Size (px)

Citation preview

TEXT MINING: THE NEXT

DATA FRONTIER An Infrastructural Approach

@openminted_eu

An EU infrastructure Project

Text mining – it seems so easy:

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

NLP Analysis Entity

Recognition

Data Mining

Knowledge

Discovery

Information Extraction

STAGE 1 STAGE 2 STAGE 3 STAGE 4

Information Retrieval

OPENMINTED = The Open Mining Infrastructure for Text and Data

But it actually poses many challenges…

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

? ?

? ?

?

?

?

? ? ? ? ?

?

? ?

?

?

Current TDM challenges for researchers

1. Content challenges - Barriers and obstacles due to non-availability, technical

restrictions, copyright law or licensing issues

- No uniform way to search for, retrieve and access

content for TDM

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Current TDM challenges for researchers

2. Services challenges How to identify the most fitting TDM service? Do I have

permission to use it?

How to combine with other TDM services I have access

to? How to use them on my content?

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Current TDM challenges for researchers

3. Processing challenges

Where to deploy? Are my machines powerful enough?

How can I get access to powerful machines?

Where to store intermediate and final results?

How to ensure persistence of storage?

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

OpenMinTeD offers a solution for all TDM challenges:

It establishes an open and sustainable TDM infrastructure where researchers can collaboratively:

create, discover, share and re-use

knowledge from a wide range of text based scientific-related sources.

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

OpenMinTeD brings together:

@openminted_eu

8

ACCESSIBLE

CONTENT

DISCOVERABLE

SERVICES

EFFICIENT

PROCESSING

TDM

COMMUNITIES

VALUE ADDED

APPS

Via standardised programmatic interfaces and access rules

Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text

Operate on public e-Infrastructures via standarized APIs

Different scientific communities have different challenges

Community-driven applications to illustrate the value of the infastructure. Engage with industry.

OPENMINTED = The Open Mining Infrastructure for Text and Data

The project Starts: June 2015

Duration: 3 years

16 Partners:

- 6 mining research groups

- 3 content providers

- 1 data center

- 1 library association

- 2 legal experts

- 6 community related partners

- 2 SMEs

ICT2015 conference - Lisbon, 20-22 Oct

Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling

PARTNERS

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Infrastructural approach

• OpenMinted does not build new services, but adopts and adapts existing services for new communities

• Focus on interoperability across text mining services and content providers

• Open & collaborative

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

The OpenMinTeD landscape

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

Data centre Data centre Data centre Data centre

in public cloud

Publisher text corpus

OpenAIRE/CORE text corpus

PMC text corpus

Other text corpora

Other text corpora

Other text corpora

Other types of text corpora

Layer 3:

Interoperability

to shared storage and

computing resources

Language resources Language resources

Language resources Language resources

Layer 2:

Interoperability of

language resources

& corpora

Layer 1:

Interoperability

of text mining services

(platforms or

components)

Language resources and corpora registry service

Platform services

Users: researchers, curators, text-miners and new services developers

Registry Workflow Management Auth2 & Policy management Annotator Accounting

Mining Platforms Mining Platforms Mining Platforms

Proprietary architectures

Mining Platforms

OPENMINTED = The Open Mining Infrastructure for Text and Data

Design

Interoperability framework

Bringing together mining tools, resources and content:

1. Content metadata & transfer standards

To document scientific literature, language resources, taxonomies and provenance and to transfer protocols for full text retrieval

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Interoperability framework

Bringing together mining tools, resources and content:

2. Service metadata & pipelining

To document and classify text mining services, how they receive input, in what form they output their resutls, how they combine for workflows, what granularity to consider.

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Interoperability framework

Bringing together mining tools, resources and content:

3. IPR and licensing

To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and non-commercial mining research

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

Working groups

1. Resource metadata: content, services, language resources

2. Text, lexica, terminologies and ontologies representation and access

3. IPR and licensing

4. Text annotation and text-mining services workflows

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

OpenMinTeD’s users

1. End users who will consume TM services

- Researchers, data base curators, …

- Novice: use services to advance their science

- Advanced: include TM services into more complex research workflows (SMEs).

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

OpenMinTeD’s users

2. Content and service providers that will provide their content and/or TM services for consumption

- Publishers, libraries, scientific data base centres, etc

- TM research communities

- SMEs

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

ICT2015 conference - Lisbon, 20-22 Oct

@openminted_eu

RESEARCH ANALYTICS

SOCIAL SCIENCES

AGRICULTURE

LIFE SCIENCES

Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results.

OPENMINTED = The Open Mining Infrastructure for Text and Data

What can OpenMinTeD do for you?

Are you a content provider? (datacentre, library, publisher, etc)

OpenMinTeD helps you make your content available for mining

Register your collections in the OpenMinTeD registry, make them discoverable!

Go to www.openminted.eu

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

What can OpenMinTeD do for you?

Are you a TDM service?

OpenMinTeD helps you share and collaborate with other TDM services

Register your TDM service in the OpenMinTeD registry, make it easily discoverable!

Go to www.openminted.eu

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

THANK YOU!

Go to: www.openminted.eu

to get involved!

@openminted_eu