18
Data Provenance for Phyloinformatics: Introduction & Survey Results Elliott Hauser UNC Information Science Karen Cranston NESCent Informatics

Phylogenetics & Data Provenance: Survey Results

Embed Size (px)

Citation preview

Data Provenance for Phyloinformatics:

Introduction & Survey ResultsElliott Hauser

UNC Information Science

Karen CranstonNESCent Informatics

Overview:What is Phylogenetics?

What is Phylogenetic Data?

Source: DRAFT: Current Best Practices for Publishing Trees Electronically, 2010. Stoltzfus et al. http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/LinkingTrees2010

...many things!

What is Phylogenetic Data?

Source: http://github.com/miapa/miapa-etl/tree/master/nexmlex

<A sample NeXML file>

What is a Minimum Information Standard?

The answer to this question, for a domain:

"What is the minimum information necessary for an independent scientist to carry out an independent analysis of the data?"

Quackenbush, 2005

For Phylogenetics, this is MIAPA:Minimum Information About a Phylogenetic Analysis

What do we need to know to analyze this tree?

Overview:What is MIAPA?

Source: Leebens-Mack et al. 2006

Overview: Producers' and Consumers' attitudes

Most important metadata type

Least important metadata type

Source: Cranston MIAPA survey, 2012 (unpublished)

Half of all metadata types are critically important to two+ subfields

Source: Cranston MIAPA survey, 2012 (unpublished)

The majority of metadata types are easy to produce for all subfields

Source: Cranston MIAPA survey, 2012 (unpublished)

How to balance the needs of Producers and Consumers?

Most important metadata type

Least important metadata type

Source: Cranston MIAPA survey, 2012 (unpublished)

Metadata at work:The Open Tree of Life Project

Conflicting Data, Conflicting Needs:● A Single, 'Best' Tree of Life● Access to Underlying, Conflicting Trees

A new research area:Computational data provenance

...Huh?

A new research area:Computational data provenance

Computational: The result of a computation

Data provenance: Where/how it came to be

As science becomes more and more computational, we need to know more about

our data!

Reprise:What is Phylogenetics?

a perfect field for computational data provenance!

Discussion

Will our survey results predict actual behavior?

What tools, if any, will preserve and encourage submission of computational data provenance?

Is computational data different from measurement data, classification data, or other types of metadata? If so, does that affect our work?

Reprise: balancing the needs of Producers and Consumers?

Most important metadata type

Least important metadata type