Towards Open Methods: Using Scientific Workflows in Linguistics

Preview:

Citation preview

1

Towards Open Methods: Using Scientific Workflows in

LinguisticsRichard Littauer

2

Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories.

Introduction

3

Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.

Introduction

4

Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.

They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers.

Introduction

5

By providing front-end visualisations and adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases.

Introduction

6

How does this relate to Linguistics?

Workflows in Linguistics

7

How does this relate to Linguistics? Many workflow systems I've been looking at

would work in the field of corpus linguistics if we merely had open source databases online to mine.

Workflows in Linguistics

8

How does this relate to Linguistics? Many workflow systems I've been looking at

would work in the field of corpus linguistics if we merely had open source databases online to mine.

They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work.

Workflows in Linguistics

9

How does this relate to Open Linguistics?

Workflows in Linguistics

10

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

11

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

12

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

13

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

14

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

15

Examples

• Example workflow

16

Examples

• Example workflow

• This grabs the most recent XKCD comic off the web.

• http://www.myexperiment.org/workflows/1370.html

17

Examples

• Another example workflow

18

Examples

• Another example workflow

• This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.

• http://www.myexperiment.org/workflows/117.html

19

Hypothetical Example

20

Hypothetical Example

Chinese character

from a text

21

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

22

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

23

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

24

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

Character - Proper dialect reading - definition

25

Use in Linguistics

• So, if we have a linked network online that is queryable

26

Use in Linguistics

• So, if we have a linked network online that is queryable

• Hypothetically, it should be possible to use current workflow systems to access and download data

27

Use in Linguistics

• So, if we have a linked network online that is queryable

• Hypothetically, it should be possible to use current workflow systems to access and download data

• My hope is to see how feasible this is

28

Use in Linguistics

Other use:

29

Use in Linguistics

Other use: Shims: data conversion workflows.

30

Use in Linguistics

Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are

varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…

31

Use in Linguistics

How does this help Open Methods?

32

Use in Linguistics

How does this help Open Methods? By keeping track of workflows and workflow

systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)

33

Use in Linguistics

How does this help Open Methods? By keeping track of workflows and workflow

systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)

This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.

34

Use in Linguistics

How does this help Open Methods? Also, most workflows are now focusing more

on providing provenance solutions.

35

Use in Linguistics

How does this help Open Methods? Also, most workflows are now focusing more

on providing provenance solutions. This would make linguistics research more

sharable, understandable and repeatable.

36

Use in Linguistics

Work going on this, currently:

37

Use in Linguistics

Work going on this, currently: Steiner Lydia, Peter F. Stadler, Michael

Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127.

38

More Information

Places to look for more information: http://notebooks.dataone.org/workflows

39

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/

40

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/

41

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org

42

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w

orkflows-in-linguistics/

43

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w

orkflows-in-linguistics/

Thank you. Questions?

Recommended