23
1 twitter.com/openminted_eu Peter Mutschke ITOC Workshop Philadelphia February 20, 2016 Open Mining Infrastructure for Text & Data (OpenMinTeD)

OpenMinted: It's Uses and Benefits for the Social Sciences

Embed Size (px)

Citation preview

Page 1: OpenMinted: It's Uses and Benefits for the Social Sciences

1

twitter.com/openminted_eu

Peter Mutschke

ITOC Workshop Philadelphia – February 20, 2016

Open Mining Infrastructure for Text & Data (OpenMinTeD)

Page 2: OpenMinted: It's Uses and Benefits for the Social Sciences

2

Goal of Text Mining

This is where the footer goes

implementation of transformational processes

that …

uncover knowledge in unstructured text salient content items

hidden relationships between content items

…to assist researchers and scientific data

curators in making sense of the textual data

Page 3: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

3

The phases of text mining

taken from ICT2015 presentation (N. Manola) @openminted_eu

NLP Analysis

Entity

Recognition

Data Mining

Knowledge

Discovery

Information

Extraction

STAGE 1 STAGE 2 STAGE 3 STAGE 4

Information

Retrieval

OPENMINTED - The Open Mining Infrastructure for Text and Data

Page 4: OpenMinted: It's Uses and Benefits for the Social Sciences

4

Challenges

This is where the footer goes

Text Mining (TM) remains a fragmented set of tools

TM requires particular technological and analytical skills

as well as domain knowledge

no shared knowledge how to apply

lack of a central infrastructure

(may rule out use of TM for small research groups)

high entry costs:

need to share infrastructure costs

Page 5: OpenMinted: It's Uses and Benefits for the Social Sciences

5

Putting it all together

This is where the footer goes

OpenMinTeD Establish an open and sustainable Text and

Data Mining (TDM) platform and infrastructure

where researchers can collaboratively create,

discover, share and re-use knowledge from a

wide range of text based scientific and

scholarly related sources

Page 6: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

6

OpenMinTeD – working on many fronts

@openminted_eu

6

ACCESSIBLE

CONTENT

DISCOVERABLE

SERVICES

EFFICIENT

PROCESSING

TDM

COMMUNITIES

VALUE ADDED

APPS

Via standardised programmatic interfaces and access rules

Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text

Operate on public e-Infrastructures via standarized APIs

Different scientific communities have different challenges

Community-driven applications to illustrate the value of the infastructure. Engage with industry.

OPENMINTED - The Open Mining Infrastructure for Text and Data

taken from ICT2015 presentation (N. Manola)

Page 7: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

7

Bridging the gap between different communities

@openminted_eu

Page 8: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

8

The project Starts: June 2015

Duration: 3 years

16 Partners:

- 6 mining research groups

- 3 content providers

- 1 data center

- 1 library association

- 2 legal experts

- 6 community related partners

- 2 SMEs

Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling

PARTNERS

@openminted_eu

OPENMINTED = The Open Mining Infrastructure for Text and Data

taken from ICT2015 presentation (N. Manola)

Page 9: OpenMinted: It's Uses and Benefits for the Social Sciences

9

OpenMinTeD users

This is where the footer goes

TM consumer to advance their science

Service Providers to enhance their

tools

TM researcher to share their algorithms

Content providers to enrich their

content

Page 10: OpenMinted: It's Uses and Benefits for the Social Sciences

10

Infrastructural approach

This is where the footer goes

OpenMinted does not build new services,

but adopts and adapts existing services for

new communities

Focuses on interoperability across text

mining services and content providers

Creates an open & collaborative space for

researchers to use the best fitting textmining

services available

Page 11: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

11 @openminted_eu

Data centre Data centre Data centre Data centre

in public cloud

Publisher text corpus

OpenAIRE/CORE text corpus

PMC text corpus

Other text corpora

Other text corpora

Other text corpora

Other types of text corpora

Layer 3:

Interoperability

to shared storage and

computing resources

Language resources Language resources

Language resources Language resources

Layer 2:

Interoperability of

language resources

& corpora

Layer 1:

Interoperability

of text mining services

(platforms or

components)

Language resources and corpora registry service

Platform services

Users: researchers, curators, text-miners and new services developers

Registry Workflow Management Auth2 & Policy management Annotator Accounting

Mining Platforms Mining Platforms Mining Platforms

Proprietary architectures

Mining Platforms

OPENMINTED = The Open Mining Infrastructure for Text and Data

The architecture

taken from ICT2015 presentation (N. Manola)

Page 12: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

12 @openminted_eu

RESEARCH

ANALYTICS

SOCIAL

SCIENCES

AGRICULTURE LIFE

SCIENCES

Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results.

OPENMINTED = The Open Mining Infrastructure for Text and Data

taken from ICT2015 presentation (N. Manola)

Page 13: OpenMinted: It's Uses and Benefits for the Social Sciences

13

Science driven approach

This is where the footer goes

Page 14: OpenMinted: It's Uses and Benefits for the Social Sciences

14

GESIS: Infrastructure for the

Social Sciences

This is where the footer goes

Page 15: OpenMinted: It's Uses and Benefits for the Social Sciences

15

GESIS Research Data Cycle

This is where the footer goes

Study planning Archiving and

registering

Searching

Data collection Data analysis

15

Page 16: OpenMinted: It's Uses and Benefits for the Social Sciences

16

Difficulties in Information Seeking

This is where the footer goes

Page 17: OpenMinted: It's Uses and Benefits for the Social Sciences

17

Problems Processing Search Results

This is where the footer goes

Page 18: OpenMinted: It's Uses and Benefits for the Social Sciences

18

Usefulness of TM enhanced search services

This is where the footer goes

Page 19: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

19

Social Science Use case

Develop and evaluate methods for

automatic detection and linking of named

entities in Social Science publications in

order to advance reliable and context-

sensitive retrieval and linking of relevant

entities

1

9

@openminted_eu

Page 20: OpenMinted: It's Uses and Benefits for the Social Sciences

20

Enhancing Search in Text and Data

This is where the footer goes

classical named entity recognition and

disambiguation of relevant entities (names,

places, organizations, terms) to enhance

automatic indexing

recognition of vague variable mentions to

enhance linking of data and publications

enrich data with context information from text

to enhance retrievability of data sets

Page 21: OpenMinted: It's Uses and Benefits for the Social Sciences

21

Identifying references to survey variables

This is where the footer goes

OLGA NEŠPOROVÁ, ZDENĚK

R. NEŠPOR (2009). “Religion: An

Unsolved Problem for the Modern

Czech Nation”

ISSP 2008

Link Database

v39: Believe in life after death

v40: Believe in Heaven

Page 22: OpenMinted: It's Uses and Benefits for the Social Sciences

22

Benefits from user perspective

This is where the footer goes

semantic search: understanding the contextual

meaning of (search) terms

fuzzy phrase search: search for attitudes,

survey questions in texts (under vagueness)

link retrieval: search and retrieval of links

between text and data

dataset retrieval: facilitating search for research

data in data catalogues at the level of items and

variables

Page 23: OpenMinted: It's Uses and Benefits for the Social Sciences

• 1 • 2

• 3 • 4

• 5 • 6

• 7

23

Contact us

www.openminted.eu

[email protected]

twitter.com/openminted_eu

facebook.com/openminted

bit.do/openmintedlinkedin vimeo.com/openminted

bit.do/openmintedplus