Upload
lenga
View
215
Download
2
Embed Size (px)
Citation preview
Thursday, February 13, 2014
WheatIS EWG 2014 action plan Chair: Hadi Quesneville
Co-‐Chairs: Mario Caccamo, David Edwards, Gerard Lazo
Expert Working Group members: Alaux Michael, Ruth Bastow, Ute Baumann, Fran Clarke, Jorge Dubcovsky, David Edwards, Javier Herrero, Takeshi Itoh, Paul Kersey, David Marshall, Cesar Martinez, Dave Matthews, Klaus Mayer, Amidou N’Diaye, Christopher Rawlings, Franck Röber, Doreen Ware.
The EWG met at the WheatIS kickoff meeting the 2nd and 3rd of December 2013, and during the PAG meeting the 13th of January 2014. The EWG organization and the project were discussed. This report summarizes the discussion and the action plan decided.
WP1: Central data file repository The EWG found that this work package (WP) has a confusing title name and meaning. We renamed it to highlight its main goal to provide submission workflows as “Data submission process”.
The Unité de Recherche Génomique Info (URGI) presented a demo of Dspace, a tool able to manage data submission. The demo suggested:
• We have to explore its distributed system capabilities on top of the integrated Rule Oriented Data System (iRODS) file system
• Check if Dspace can validate the correct use of ontologies during the metadata submission. • See how links (URL) with other repositories can be implemented
URGI and the Genome Analysis Centre (TGAC) will start to explore these needed functionalities.
We also need to use (or develop if missing) file format validators to check for the integrity of file standards and data ontologies. This activity has to be connected with the WP3 outputs.
WP2: Distributed index search engine A distributed index search engines were tested by URGI (Solr and elastic search). We collectively agree to go ahead with Solr.
We decided to share a common Solr model. We will evaluate the TransPLANT Solr data model to see if it fits to the WheatIS data. Each WheatIS node will check if it is compatible with the data they want to expose. The European Bioinformatics Institute (EBI) will send a description of this data model; the following partners will evaluate it: Dave Edwards, CerealsDB, GrainGenes, and Rothamsted for evaluation and feedback.
URGI and TGAC will check how Dspace can be indexed with Solr following the WheatIS/TransPLANT Solr data model.
2
EBI and URGI will write a tutorial to explain how to install a Solr server and configure it with our data model. A virtual machine can be shared if useful.
URGI will develop a portal to query the Solr servers. The TransPLANT Solr servers will be included in the search in addition to those of WheatIS nodes. The Munich Information Center for Protein Sequences (MIPS) will check the ranking relevance of the search results, according to wheat (and relative) data types (see WheatIS survey). The first prototype will be then tested by users to improve results ranking and rationale. This could be organized during a training session.
The next “Genome informatics meeting” (21-‐24 September 2014, Churchill College, Cambridge, UK) appears to be a good place to discuss Solr implementation with Gramene and EBI.
WP3: Data standards, data management, and data integration The EWG discussed a lot about the work package 3 (WP3) on data interoperability. This WP appears to be central for the WheatIS project. It should connect with other “data standard” groups and should provide, among other tasks, a gaps analysis for the wheat data needs.
The EWG agreed on the goal to provide concrete application of the data standard recommendations to demonstrate their usefulness to the wheat science community. One way could be to provide tools to support the use of data standards.
The EWG identified the following data types as priorities according to current community needs:
1. Genetic variation: molecular markers (SNPs, SSRs, etc), CNVs, ISBPs/RJM, GbS (or GbyS). 2. Genetic/physical maps 3. Sequence Assembly + Annotation 4. Genetic resources 5. Expression data 6. Phenotype data
The WP3 work started with an initiative from a wheat Research Data Alliance (RDA) Agricultural data interest group (https://www.rd-‐alliance.org/about.html). Their aims are to:
• Provide a common framework for describing, representing linking and publishing wheat data with respect to open standards
• Promote and sustain wheat data sharing, reusability and operability • Specify which (minimal) metadata is needed to describe a particular data type • Recommend vocabularies, ontologies, and formats • Recommend good practices for data sharing
The first deliverable of this group will be a cookbook on how to produce “wheat data” that are easily sharable, reusable and interoperable.
One issue raised by the Expert Working Group (EWG) is the representation of a similar initiative to this RDA group such as the plant ontology people. Consequently, the RDA proposal has been sent to the EWG to check if some important people are missing in the group in order to include them. Some have been reached at the recent Plant and Animal Genome (PAG) meeting. Laurel Cooper from the Plant Ontology attended the WheatIS PAG meeting and agreed to become involved in the RDA initiative.
3
A survey prepared by the RDA group has been sent to the EWG for validation. The EWG suggested some changes to help the interpretation of the survey results.
The EWG recommend evaluating the cookbook on concrete cases as soon as possible. The timeline when the data could be shared still has to be considered. People from large projects such as SeeD, Triticeae-‐CAP (T-‐CAP), BREEDWHEAT, and the Wheat Improvement Strategic Programme (WISP) have been asked to become involved to ensure the cookbook will fit their needs. We still have to identify a contact person from each of these projects whose responsibilities will be to give feedback to the RDA initiative. In addition, we need also to add representatives from the private sector.
WP4: Data and Information Infrastructure We agreed on the need of an infrastructure able to move large amounts of data to compute on a large infrastructure. This WP has to propose potential solutions.
An iRODs “experiment” continues to make progress at URGI and TGAC. We should include iPlant (US) and Qcloud (AU) to test further the solution. Doreen Ware and Dave Edwards will be the contact persons iPlant and Qcloud projects, respectively.
WP5: User interfaces, outreach, training and dissemination We need a first web Portal by the summer of 2014. It must be conceived with multilanguage support, even if other languages than English must be provided by candidate countries. A first prototype will be proposed by Dave Edwards this spring. For this, WheatIS nodes have to send their offered services (e.g. blast, annotation, breeding tools, etc.) and hosted data types.
We decided to book the domain name “wheatIS.org”. It should be used by the WheatIS nodes. Ruth Bastow proposed to design a logo for this initiative.
We decided to reorder the WP and hence to provide a new project roadmap document to help better communications for the project.
WP6: Coordination and project management The present EWG members elected for two year unanimously Hadi Quesneville as Chair of the WheatIS, and Mario Caccamo, Dave Edwards, and Gerard Lazo as co-‐chairs. Next year co-‐chairs could be reconsidered according to addition of new EWG member to optimize country/continent representation.
We decided to organize a regular conference call. The first one is planned for April 14, 2014. We should use the Wheat Initiative visioconference system (probably WebEx). The next working meeting would be organized around the next International Triticeae Mapping Initiative (ITMI) meeting. The next WheatIS annual meeting will be organized at PAG in 2015. We will take half a day to review the actions of the year and plan the next objectives for the following year.
4
Action plan summary WP What Who When
WP1 Explore Dspace distributed system capabilities on top of iRODS file system
Check if Dspace can validate the correct use of ontologies during the metadata submission
See how links (URL) with other repository can be implemented
URGI, TGAC Summer 2014
Provide file format validator URGI, TGAC Winter 2015
WP2 Send a description of the TransPLANT Solrdata model to all nodes
EBI Winter 2014
Evaluate the TransPLANT Solr data model Dave Edwards, CerealsDB, Graingene, and Rothamsted
Summer 2014
Check how Dspace can be indexed with Solr WheatIS/TransPLANT data model
URGI, TGAC Fall 2014
Develop a portal to query the Solr servers URGI Summer 2014
Write a tutorial to explain how to install a Solr server and configure it with the WheatIS/TransPLANT data model. A virtual machine can be shared if useful
URGI, EBI Summer 2014
Check the ranking relevance of the search results, according to wheat (and relative) data types (see WheatIS survey)
MIPS Fall 2014
The first prototype tested by users to improve results ranking and ergonomy. This could be organized during a training session
All Nodes Winter 2015
WP3 A Cookbook on how to produce “wheat data” that are easily sharable, reusable and interoperable
RDA group Fall 2014
Concrete application of the recommendation on data from large projects such as SeeD, T-‐CAP, BREEDWHEAT, and WISP
All nodes with their users
Winter 2015
WP4 1st iRODs federation experiment URGI, TGAC Spring 2014
2nd iRODs federation experiment iPlant (Doreen Ware), Qcloud
Fall 2014
5
(Dave Edwards)
WP5 WheatIS web portal Dave Edwards
Spring 2014
Updated project roadmap document Hadi Quesneville
Spring 2014
Logo design for WheatIS Ruth Bastow
Summer 2014
WP6 Conference call Hadi Quesneville
April 2014
Working meeting @ ITMI Hadi Quesneville
June 29 – July 4, 2014
Annual meeting Hadi Quesneville
01/10/2015 -‐ 01/14/2015