View
218
Download
2
Category
Preview:
Citation preview
Metadata in the iPlant Collaborative Cyberinfrastructure
Birds of a Feather meeting at PAG XXII, Jan. 14, 2014
From the iPlant Data Strategy:
“The vision for iPlant CI data capabilities is to provide flexible, adaptive and scalable data infrastructure that enables users and communities to implement best practices for data management.”
How to enable best practices for data management in iPlant:
1. A way to add and edit metadata2. Metadata templates for common file types3. Search and browse iDS based on metadata and file
content4. Support for unstructured and structured
(relational) data within the iDS5. Interoperability with key external data sources6. Benefits/features that are aligned with the use of
popular file types7. An iPlant Data Commons for public data
1. CI to enable users to add and edit metadata using simple and flexible interfaces, including customizable metadata components.– a web-based user interface accessible via the DE– upload metadata as csv file– access to all metadata entities via iPlant APIs
Metadata: /iplant/home/user/file.txt
Attribute6 Value6
attribute_1 value_a
attribute_2 value_b
attribute_3 value_c
attribute_4 value_d
Add Delete Templates 6
OK Cancel
Browse Templates
2. Project data management templates and best practices for organizing, handling and managing data for diverse use cases, including:– groups or consortia working on large-scale
genome and transcriptome sequencing projects or species range maps
– single PI laboratories focused on specific analysis such as RNA-Seq experiments, phenotype data sets
Metadata: /iplant/home/user/file.txt
Attribute6 Value6
attribute_1 value_a
attribute_2 value_b
attribute_3 value_c
attribute_4 value_d
Add Delete Templates 6
OK Cancel
Browse Templates
Metadata: /iplant/home/user/file.txt
Attribute6 Value6
attribute_1 value_a
attribute_2 value_b
attribute_3 value_c
attribute_4 value_d
Add Delete Browse
Templates
OK Cancel
Browse Templates
Cancel
Metagenomic Sequence (MIMS)
Eukaryotic Genome Sequence (MIGS)
Genome Sequence in iDS
Item 1
Select a template
Insert
Attributes Preview
Metadata: /iplant/home/user/file.txt
Attribute6 Value6
attribute_1 value_a
attribute_2 value_b
attribute_3 value_c
attribute_4 value_d
Add Delete Browse
Templates
OK Cancel
Browse Templates
Cancel Insert
Metagenomic Sequence (MIMS)
Eukaryotic Genome Sequence (MIGS)
Item 3Item 5 Genome Sequence in iDS
Item 1
Attributes Preview
project specimen identifier
i collection date
i geographic location nam…
i geographic location longi… geographic location
latit…
i genus
i species infraspecific name
Metadata: /iplant/home/user/file.txt
Attribute6 Value6
attribute_1 value_a
attribute_2 value_b
attribute_3 value_c
attribute_4 value_d
Add Delete Browse
Templates
OK Cancel
Browse Templates
Cancel Insert
Metagenomic Sequence (MIMS)
Eukaryotic Genome Sequence (MIGS)
Item 3Item 5 Genome Sequence in iDS
Item 1
Attributes Preview
project specimen identifier
i collection date
i geographic location nam…
i geographic location longi… geographic location
latit…
i genus
i species infraspecific name
Metadata: /iplant/home/user/file.txt
Add Delete Browse Templates
OK Cancel
Accordion Item
Accordion Item
Accordion Item
Attribute6 Value6
i project* jackson
i specimen identifier 54769
i collection date* 2008-01-23T19:23
i sequencing method*
Template: Metagenemoic Sequence
Metadata
Metadata: /iplant/home/user/file.txt
Add Delete Browse Templates
OK Cancel
Accordion Item
Accordion Item
Accordion Item
Attribute6 Value6
i project* jackson
i specimen identifier 54769
i collection date* 2008-01-23T19:23
i sequencing method*
Template: Metagenemoic Sequence
Metadata
All of these are ISO8601 compliant time stamps: 2008-0123T19:23:10+00:00…
Metadata: /iplant/home/user/file.txt
Add Delete Browse Templates
Cancel
Accordion Item
Accordion Item
Accordion Item
Attribute6 Value6
i project* jackson
i specimen identifier 54769
i collection date* 2008-01-23T19:23
i sequencing method*
Template: Metagenemoic Sequence
Metadata
OK
Metadata: /iplant/home/user/file.txt
Add Delete Browse Templates
OK Cancel
Accordion Item
Accordion Item
Accordion Item
Attribute6 Value6
i project* jackson
i specimen identifier 54769
i collection date* 2008-01-23T19:23
i sequencing method*
Template: Metagenemoic Sequence
Metadata
This field is required.
Metadata: /iplant/home/user/file.txt
Add Delete Browse Templates
Cancel
Accordion Item
Accordion Item
Accordion Item
Attribute6 Value6
i project* jackson
i specimen identifier 54769
i collection date* 2008-01-23T19:23
i sequencing method* DOI#
Template: Metagenemoic Sequence
Metadata
OK
3. CI to support searching and browsing based on metadata attributes and suitable file content.– provenance/system metadata and scientific
metadata– across both private data and public data– ontology enhanced searches
Search capabilities
• Search API: users will be able to search by – file or folder name– any metadata attribute or value– date created– date last modified– creator– file size– file type– tool that created the file– analysis that created a file or folder– constraints (and, or, xor)
Search capabilities
• Users will be able to make "smart folders", that is, folders for all the files that match a set of search criteria.
4. Support for unstructured, semi-structured, and structured (relational) data within the iDS.– Document-based and NoSQL approaches to
support unstructured and semi-structured data– Support for large matrix based data sets (e.g., in
GBS, GWAS, etc.)– A way for users to search and access data in iPlant-
hosted projects that include MySQL and PostgreSQL databases
5. Interoperability with key external data sources, including, but not limited to:– Ability to use external data in analyses run through
iPlant, e.g., import from BioMart– Access to databases like CoGe, PO, MaizeGDB– Ability to push/publish/link data housed in iDS to
canonical public repositories like NCBI, Data Dryad– Ability to engage semantic services and semantic
pipelines based on metadata and ontological reasoning systems.
6. Benefits/features that are aligned with the use of popular file types. – provide the suitable utilities, tools, integration,
and documentation on best data management practices for projects utilizing these formats
• Demo:http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/iplant_public_test
Recommended