Upload
richard-littauer
View
1.081
Download
1
Tags:
Embed Size (px)
Citation preview
1
Towards Open Methods: Using Scientific Workflows in
LinguisticsRichard Littauer
2
Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories.
Introduction
3
Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.
Introduction
4
Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.
They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers.
Introduction
5
By providing front-end visualisations and adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases.
Introduction
6
How does this relate to Linguistics?
Workflows in Linguistics
7
How does this relate to Linguistics? Many workflow systems I've been looking at
would work in the field of corpus linguistics if we merely had open source databases online to mine.
Workflows in Linguistics
8
How does this relate to Linguistics? Many workflow systems I've been looking at
would work in the field of corpus linguistics if we merely had open source databases online to mine.
They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work.
Workflows in Linguistics
9
How does this relate to Open Linguistics?
Workflows in Linguistics
10
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
Act as a central point of reference and support for people interested in open linguistic data.
Provide guidance on legal issues surrounding linguistic data to the community.
Build an index of indexes of open linguistic data sources and tools and link existing resources.
Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical
infrastructure. Assemble best-practice guidelines / use cases to create, use and
distribute data.
Open Linguistics
11
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
Act as a central point of reference and support for people interested in open linguistic data.
Provide guidance on legal issues surrounding linguistic data to the community.
Build an index of indexes of open linguistic data sources and tools and link existing resources.
Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical
infrastructure. Assemble best-practice guidelines / use cases to create, use and
distribute data.
Open Linguistics
12
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
Act as a central point of reference and support for people interested in open linguistic data.
Provide guidance on legal issues surrounding linguistic data to the community.
Build an index of indexes of open linguistic data sources and tools and link existing resources.
Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical
infrastructure. Assemble best-practice guidelines / use cases to create, use and
distribute data.
Open Linguistics
13
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
Act as a central point of reference and support for people interested in open linguistic data.
Provide guidance on legal issues surrounding linguistic data to the community.
Build an index of indexes of open linguistic data sources and tools and link existing resources.
Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical
infrastructure. Assemble best-practice guidelines / use cases to create, use and
distribute data.
Open Linguistics
14
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
Act as a central point of reference and support for people interested in open linguistic data.
Provide guidance on legal issues surrounding linguistic data to the community.
Build an index of indexes of open linguistic data sources and tools and link existing resources.
Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical
infrastructure. Assemble best-practice guidelines / use cases to create, use and
distribute data.
Open Linguistics
15
Examples
• Example workflow
16
Examples
• Example workflow
• This grabs the most recent XKCD comic off the web.
• http://www.myexperiment.org/workflows/1370.html
17
Examples
• Another example workflow
18
Examples
• Another example workflow
• This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.
• http://www.myexperiment.org/workflows/117.html
19
Hypothetical Example
20
Hypothetical Example
Chinese character
from a text
21
Hypothetical Example
Chinese character
from a text
Dictionary Database
[ zhi1], [zi2], [zhi2], [shi2], [ci1]
22
Hypothetical Example
Chinese character
from a text
Dictionary Database
[ zhi1], [zi2], [zhi2], [shi2], [ci1]
Geographical data from researcher
23
Hypothetical Example
Chinese character
from a text
Dictionary Database
[ zhi1], [zi2], [zhi2], [shi2], [ci1]
Geographical data from researcher
24
Hypothetical Example
Chinese character
from a text
Dictionary Database
[ zhi1], [zi2], [zhi2], [shi2], [ci1]
Geographical data from researcher
Character - Proper dialect reading - definition
25
Use in Linguistics
• So, if we have a linked network online that is queryable
26
Use in Linguistics
• So, if we have a linked network online that is queryable
• Hypothetically, it should be possible to use current workflow systems to access and download data
27
Use in Linguistics
• So, if we have a linked network online that is queryable
• Hypothetically, it should be possible to use current workflow systems to access and download data
• My hope is to see how feasible this is
28
Use in Linguistics
Other use:
29
Use in Linguistics
Other use: Shims: data conversion workflows.
30
Use in Linguistics
Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are
varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…
31
Use in Linguistics
How does this help Open Methods?
32
Use in Linguistics
How does this help Open Methods? By keeping track of workflows and workflow
systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
33
Use in Linguistics
How does this help Open Methods? By keeping track of workflows and workflow
systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.
34
Use in Linguistics
How does this help Open Methods? Also, most workflows are now focusing more
on providing provenance solutions.
35
Use in Linguistics
How does this help Open Methods? Also, most workflows are now focusing more
on providing provenance solutions. This would make linguistics research more
sharable, understandable and repeatable.
36
Use in Linguistics
Work going on this, currently:
37
Use in Linguistics
Work going on this, currently: Steiner Lydia, Peter F. Stadler, Michael
Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127.
38
More Information
Places to look for more information: http://notebooks.dataone.org/workflows
39
More Information
Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/
40
More Information
Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/
41
More Information
Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org
42
More Information
Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w
orkflows-in-linguistics/
43
More Information
Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/w
orkflows-in-linguistics/
Thank you. Questions?