43
Towards Open Methods: Using Scientific Workflows in Linguistics Richard Littauer 1

Towards Open Methods: Using Scientific Workflows in Linguistics

Embed Size (px)

Citation preview

Page 1: Towards Open Methods: Using Scientific Workflows in Linguistics

1

Towards Open Methods: Using Scientific Workflows in

LinguisticsRichard Littauer

Page 2: Towards Open Methods: Using Scientific Workflows in Linguistics

2

Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories.

Introduction

Page 3: Towards Open Methods: Using Scientific Workflows in Linguistics

3

Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.

Introduction

Page 4: Towards Open Methods: Using Scientific Workflows in Linguistics

4

Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.

They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers.

Introduction

Page 5: Towards Open Methods: Using Scientific Workflows in Linguistics

5

By providing front-end visualisations and adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases.

Introduction

Page 6: Towards Open Methods: Using Scientific Workflows in Linguistics

6

How does this relate to Linguistics?

Workflows in Linguistics

Page 7: Towards Open Methods: Using Scientific Workflows in Linguistics

7

How does this relate to Linguistics? Many workflow systems I've been looking at

would work in the field of corpus linguistics if we merely had open source databases online to mine.

Workflows in Linguistics

Page 8: Towards Open Methods: Using Scientific Workflows in Linguistics

8

How does this relate to Linguistics? Many workflow systems I've been looking at

would work in the field of corpus linguistics if we merely had open source databases online to mine.

They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work.

Workflows in Linguistics

Page 9: Towards Open Methods: Using Scientific Workflows in Linguistics

9

How does this relate to Open Linguistics?

Workflows in Linguistics

Page 10: Towards Open Methods: Using Scientific Workflows in Linguistics

10

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

Page 11: Towards Open Methods: Using Scientific Workflows in Linguistics

11

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

Page 12: Towards Open Methods: Using Scientific Workflows in Linguistics

12

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

Page 13: Towards Open Methods: Using Scientific Workflows in Linguistics

13

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

Page 14: Towards Open Methods: Using Scientific Workflows in Linguistics

14

Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.

Act as a central point of reference and support for people interested in open linguistic data.

Provide guidance on legal issues surrounding linguistic data to the community.

Build an index of indexes of open linguistic data sources and tools and link existing resources.

Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical

infrastructure. Assemble best-practice guidelines / use cases to create, use and

distribute data.

Open Linguistics

Page 15: Towards Open Methods: Using Scientific Workflows in Linguistics

15

Examples

• Example workflow

Page 16: Towards Open Methods: Using Scientific Workflows in Linguistics

16

Examples

• Example workflow

• This grabs the most recent XKCD comic off the web.

• http://www.myexperiment.org/workflows/1370.html

Page 17: Towards Open Methods: Using Scientific Workflows in Linguistics

17

Examples

• Another example workflow

Page 18: Towards Open Methods: Using Scientific Workflows in Linguistics

18

Examples

• Another example workflow

• This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.

• http://www.myexperiment.org/workflows/117.html

Page 19: Towards Open Methods: Using Scientific Workflows in Linguistics

19

Hypothetical Example

Page 20: Towards Open Methods: Using Scientific Workflows in Linguistics

20

Hypothetical Example

Chinese character

from a text

Page 21: Towards Open Methods: Using Scientific Workflows in Linguistics

21

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Page 22: Towards Open Methods: Using Scientific Workflows in Linguistics

22

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

Page 23: Towards Open Methods: Using Scientific Workflows in Linguistics

23

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

Page 24: Towards Open Methods: Using Scientific Workflows in Linguistics

24

Hypothetical Example

Chinese character

from a text

Dictionary Database

[ zhi1], [zi2], [zhi2], [shi2], [ci1]

Geographical data from researcher

Character - Proper dialect reading - definition

Page 25: Towards Open Methods: Using Scientific Workflows in Linguistics

25

Use in Linguistics

• So, if we have a linked network online that is queryable

Page 26: Towards Open Methods: Using Scientific Workflows in Linguistics

26

Use in Linguistics

• So, if we have a linked network online that is queryable

• Hypothetically, it should be possible to use current workflow systems to access and download data

Page 27: Towards Open Methods: Using Scientific Workflows in Linguistics

27

Use in Linguistics

• So, if we have a linked network online that is queryable

• Hypothetically, it should be possible to use current workflow systems to access and download data

• My hope is to see how feasible this is

Page 28: Towards Open Methods: Using Scientific Workflows in Linguistics

28

Use in Linguistics

Other use:

Page 29: Towards Open Methods: Using Scientific Workflows in Linguistics

29

Use in Linguistics

Other use: Shims: data conversion workflows.

Page 30: Towards Open Methods: Using Scientific Workflows in Linguistics

30

Use in Linguistics

Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are

varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…

Page 31: Towards Open Methods: Using Scientific Workflows in Linguistics

31

Use in Linguistics

How does this help Open Methods?

Page 32: Towards Open Methods: Using Scientific Workflows in Linguistics

32

Use in Linguistics

How does this help Open Methods? By keeping track of workflows and workflow

systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)

Page 33: Towards Open Methods: Using Scientific Workflows in Linguistics

33

Use in Linguistics

How does this help Open Methods? By keeping track of workflows and workflow

systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)

This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.

Page 34: Towards Open Methods: Using Scientific Workflows in Linguistics

34

Use in Linguistics

How does this help Open Methods? Also, most workflows are now focusing more

on providing provenance solutions.

Page 35: Towards Open Methods: Using Scientific Workflows in Linguistics

35

Use in Linguistics

How does this help Open Methods? Also, most workflows are now focusing more

on providing provenance solutions. This would make linguistics research more

sharable, understandable and repeatable.

Page 36: Towards Open Methods: Using Scientific Workflows in Linguistics

36

Use in Linguistics

Work going on this, currently:

Page 37: Towards Open Methods: Using Scientific Workflows in Linguistics

37

Use in Linguistics

Work going on this, currently: Steiner Lydia, Peter F. Stadler, Michael

Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127.

Page 38: Towards Open Methods: Using Scientific Workflows in Linguistics

38

More Information

Places to look for more information: http://notebooks.dataone.org/workflows

Page 39: Towards Open Methods: Using Scientific Workflows in Linguistics

39

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/

Page 40: Towards Open Methods: Using Scientific Workflows in Linguistics

40

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/

Page 41: Towards Open Methods: Using Scientific Workflows in Linguistics

41

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org

Page 42: Towards Open Methods: Using Scientific Workflows in Linguistics

42

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w

orkflows-in-linguistics/

Page 43: Towards Open Methods: Using Scientific Workflows in Linguistics

43

More Information

Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w

orkflows-in-linguistics/

Thank you. Questions?