6
Thursday, February 13, 2014 WheatIS EWG 2014 action plan Chair: Hadi Quesneville CoChairs: Mario Caccamo, David Edwards, Gerard Lazo Expert Working Group members: Alaux Michael, Ruth Bastow, Ute Baumann, Fran Clarke, Jorge Dubcovsky, David Edwards, Javier Herrero, Takeshi Itoh, Paul Kersey, David Marshall, Cesar Martinez, Dave Matthews, Klaus Mayer, Amidou N’Diaye, Christopher Rawlings, Franck Röber, Doreen Ware. The EWG met at the WheatIS kickoff meeting the 2 nd and 3 rd of December 2013, and during the PAG meeting the 13 th of January 2014. The EWG organization and the project were discussed. This report summarizes the discussion and the action plan decided. WP1: Central data file repository The EWG found that this work package (WP) has a confusing title name and meaning. We renamed it to highlight its main goal to provide submission workflows as “Data submission process”. The Unité de Recherche Génomique Info (URGI) presented a demo of Dspace, a tool able to manage data submission. The demo suggested: We have to explore its distributed system capabilities on top of the integrated Rule Oriented Data System (iRODS) file system Check if Dspace can validate the correct use of ontologies during the metadata submission. See how links (URL) with other repositories can be implemented URGI and the Genome Analysis Centre (TGAC) will start to explore these needed functionalities. We also need to use (or develop if missing) file format validators to check for the integrity of file standards and data ontologies. This activity has to be connected with the WP3 outputs. WP2: Distributed index search engine A distributed index search engines were tested by URGI (Solr and elastic search). We collectively agree to go ahead with Solr. We decided to share a common Solr model. We will evaluate the TransPLANT Solr data model to see if it fits to the WheatIS data. Each WheatIS node will check if it is compatible with the data they want to expose. The European Bioinformatics Institute (EBI) will send a description of this data model; the following partners will evaluate it: Dave Edwards, CerealsDB, GrainGenes, and Rothamsted for evaluation and feedback. URGI and TGAC will check how Dspace can be indexed with Solr following the WheatIS/TransPLANT Solr data model.

WheatIS(EWG(2014actionplan( - Wheat · PDF fileWheatIS(EWG(2014actionplan(Chair: ... (WISP)!have! beenaskedtobecome!involvedtoensure!the!cookbook!will!fit!their!needs.!We!still!have!toidentify!a!

  • Upload
    lenga

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

   

Thursday,  February  13,  2014    

WheatIS  EWG  2014  action  plan  Chair:  Hadi  Quesneville  

Co-­‐Chairs:  Mario  Caccamo,  David  Edwards,  Gerard  Lazo  

Expert  Working  Group  members:  Alaux  Michael,  Ruth  Bastow,  Ute  Baumann,  Fran  Clarke,  Jorge  Dubcovsky,  David  Edwards,  Javier  Herrero,  Takeshi  Itoh,  Paul  Kersey,  David  Marshall,  Cesar  Martinez,  Dave  Matthews,  Klaus  Mayer,  Amidou  N’Diaye,  Christopher  Rawlings,  Franck  Röber,  Doreen  Ware.  

The  EWG  met  at  the  WheatIS  kickoff  meeting  the  2nd  and  3rd  of  December  2013,  and  during  the  PAG  meeting  the  13th  of  January  2014.  The  EWG  organization  and  the  project  were  discussed.  This  report  summarizes  the  discussion  and  the  action  plan  decided.  

WP1:  Central  data  file  repository  The  EWG  found  that  this  work  package  (WP)  has  a  confusing  title  name  and  meaning.  We  renamed  it  to  highlight  its  main  goal  to  provide  submission  workflows  as  “Data  submission  process”.  

The  Unité  de  Recherche  Génomique  Info  (URGI)  presented  a  demo  of  Dspace,  a  tool  able  to  manage  data  submission.  The  demo  suggested:  

• We  have  to  explore  its  distributed  system  capabilities  on  top  of  the  integrated  Rule  Oriented  Data  System  (iRODS)  file  system  

• Check  if  Dspace  can  validate  the  correct  use  of  ontologies  during  the  metadata  submission.  • See  how  links  (URL)  with  other  repositories  can  be  implemented  

URGI  and  the  Genome  Analysis  Centre  (TGAC)  will  start  to  explore  these  needed  functionalities.  

We  also  need  to  use  (or  develop  if  missing)  file  format  validators  to  check  for  the  integrity  of  file  standards  and  data  ontologies.  This  activity  has  to  be  connected  with  the  WP3  outputs.  

WP2:  Distributed  index  search  engine  A  distributed  index  search  engines  were  tested  by  URGI  (Solr  and  elastic  search).  We  collectively  agree  to  go  ahead  with  Solr.  

We  decided  to  share  a  common  Solr  model.  We  will  evaluate  the  TransPLANT  Solr  data  model  to  see  if  it  fits  to  the  WheatIS  data.  Each  WheatIS  node  will  check  if  it  is  compatible  with  the  data  they  want  to  expose.  The  European  Bioinformatics  Institute  (EBI)  will  send  a  description  of  this  data  model;  the  following  partners  will  evaluate  it:  Dave  Edwards,  CerealsDB,  GrainGenes,  and  Rothamsted  for  evaluation  and  feedback.  

URGI  and  TGAC  will  check  how  Dspace  can  be  indexed  with  Solr  following  the  WheatIS/TransPLANT  Solr  data  model.  

 

2    

EBI  and  URGI  will  write  a  tutorial  to  explain  how  to  install  a  Solr  server  and  configure  it  with  our  data  model.  A  virtual  machine  can  be  shared  if  useful.  

URGI  will  develop  a  portal  to  query  the  Solr  servers.  The  TransPLANT  Solr  servers  will  be  included  in  the  search  in  addition  to  those  of  WheatIS  nodes.  The  Munich  Information  Center  for  Protein  Sequences  (MIPS)  will  check  the  ranking  relevance  of  the  search  results,  according  to  wheat  (and  relative)  data  types  (see  WheatIS  survey).  The  first  prototype  will  be  then  tested  by  users  to  improve  results  ranking  and  rationale.  This  could  be  organized  during  a  training  session.  

The  next  “Genome  informatics  meeting”  (21-­‐24  September  2014,  Churchill  College,  Cambridge,  UK)  appears  to  be  a  good  place  to  discuss  Solr  implementation  with  Gramene  and  EBI.  

WP3:  Data  standards,  data  management,  and  data  integration  The  EWG  discussed  a  lot  about  the  work  package  3  (WP3)  on  data  interoperability.  This  WP  appears  to  be  central  for  the  WheatIS  project.  It  should  connect  with  other  “data  standard”  groups  and  should  provide,  among  other  tasks,  a  gaps  analysis  for  the  wheat  data  needs.    

The  EWG  agreed  on  the  goal  to  provide  concrete  application  of  the  data  standard  recommendations  to  demonstrate  their  usefulness  to  the  wheat  science  community.  One  way  could  be  to  provide  tools  to  support  the  use  of  data  standards.    

The  EWG  identified  the  following  data  types  as  priorities  according  to  current  community  needs:  

1. Genetic  variation:  molecular  markers  (SNPs,  SSRs,  etc),  CNVs,  ISBPs/RJM,  GbS  (or  GbyS).  2. Genetic/physical  maps  3. Sequence  Assembly  +  Annotation  4. Genetic  resources  5. Expression  data  6. Phenotype  data  

The  WP3  work  started  with  an  initiative  from  a  wheat  Research  Data  Alliance  (RDA)  Agricultural  data  interest  group  (https://www.rd-­‐alliance.org/about.html).  Their  aims  are  to:  

• Provide  a  common  framework  for  describing,  representing  linking  and  publishing  wheat  data  with  respect  to  open  standards    

• Promote  and  sustain  wheat  data  sharing,  reusability  and  operability    • Specify  which  (minimal)  metadata  is  needed  to  describe  a  particular  data  type  • Recommend  vocabularies,  ontologies,  and  formats  • Recommend  good  practices  for  data  sharing    

The  first  deliverable  of  this  group  will  be  a  cookbook  on  how  to  produce  “wheat  data”  that  are  easily  sharable,  reusable  and  interoperable.  

One  issue  raised  by  the  Expert  Working  Group  (EWG)  is  the  representation  of  a  similar  initiative  to  this  RDA  group  such  as  the  plant  ontology  people.  Consequently,  the  RDA  proposal  has  been  sent  to  the  EWG  to  check  if  some  important  people  are  missing  in  the  group  in  order  to  include  them.  Some  have  been  reached  at  the  recent  Plant  and  Animal  Genome  (PAG)  meeting.  Laurel  Cooper  from  the  Plant  Ontology  attended  the  WheatIS  PAG  meeting  and  agreed  to  become  involved  in  the  RDA  initiative.  

 

3    

A  survey  prepared  by  the  RDA  group  has  been  sent  to  the  EWG  for  validation.  The  EWG  suggested  some  changes  to  help  the  interpretation  of  the  survey  results.  

The  EWG  recommend  evaluating  the  cookbook  on  concrete  cases  as  soon  as  possible.  The  timeline  when  the  data  could  be  shared  still  has  to  be  considered.  People  from  large  projects  such  as  SeeD,  Triticeae-­‐CAP  (T-­‐CAP),  BREEDWHEAT,  and  the  Wheat  Improvement  Strategic  Programme  (WISP)  have  been  asked  to  become  involved  to  ensure  the  cookbook  will  fit  their  needs.  We  still  have  to  identify  a  contact  person  from  each  of  these  projects  whose  responsibilities  will  be  to  give  feedback  to  the  RDA  initiative.  In  addition,  we  need  also  to  add  representatives  from  the  private  sector.    

WP4:  Data  and  Information  Infrastructure  We  agreed  on  the  need  of  an  infrastructure  able  to  move  large  amounts  of  data  to  compute  on  a  large  infrastructure.  This  WP  has  to  propose  potential  solutions.    

An  iRODs  “experiment”  continues  to  make  progress  at  URGI  and  TGAC.  We  should  include  iPlant  (US)  and  Qcloud  (AU)  to  test  further  the  solution.  Doreen  Ware  and  Dave  Edwards  will  be  the  contact  persons  iPlant  and  Qcloud  projects,  respectively.  

WP5:  User  interfaces,  outreach,  training  and  dissemination  We  need  a  first  web  Portal  by  the  summer  of  2014.  It  must  be  conceived  with  multilanguage  support,  even  if  other  languages  than  English  must  be  provided  by  candidate  countries.  A  first  prototype  will  be  proposed  by  Dave  Edwards  this  spring.  For  this,  WheatIS  nodes  have  to  send  their  offered  services  (e.g.  blast,  annotation,  breeding  tools,  etc.)  and  hosted  data  types.  

We  decided  to  book  the  domain  name  “wheatIS.org”.  It  should  be  used  by  the  WheatIS  nodes.  Ruth  Bastow  proposed  to  design  a  logo  for  this  initiative.  

We  decided  to  reorder  the  WP  and  hence  to  provide  a  new  project  roadmap  document  to  help  better  communications  for  the  project.  

WP6:  Coordination  and  project  management  The  present  EWG  members  elected  for  two  year  unanimously  Hadi  Quesneville  as  Chair  of  the  WheatIS,  and  Mario  Caccamo,  Dave  Edwards,  and  Gerard  Lazo  as  co-­‐chairs.  Next  year  co-­‐chairs  could  be  reconsidered  according  to  addition  of  new  EWG  member  to  optimize  country/continent  representation.  

We  decided  to  organize  a  regular  conference  call.  The  first  one  is  planned  for  April  14,  2014.  We  should  use  the  Wheat  Initiative  visioconference  system  (probably  WebEx).  The  next  working  meeting  would  be  organized  around  the  next  International  Triticeae  Mapping  Initiative  (ITMI)  meeting.  The  next  WheatIS  annual  meeting  will  be  organized  at  PAG  in  2015.  We  will  take  half  a  day  to  review  the  actions  of  the  year  and  plan  the  next  objectives  for  the  following  year.      

 

4    

Action  plan  summary  WP   What   Who   When  

WP1   Explore  Dspace  distributed  system  capabilities  on  top  of  iRODS  file  system  

Check  if  Dspace  can  validate  the  correct  use  of  ontologies  during  the  metadata  submission  

See  how  links  (URL)  with  other  repository  can  be  implemented  

URGI,  TGAC   Summer  2014  

  Provide  file  format  validator   URGI,  TGAC   Winter  2015  

WP2   Send  a  description  of  the  TransPLANT  Solrdata  model  to  all  nodes  

EBI   Winter  2014  

  Evaluate  the  TransPLANT  Solr  data  model   Dave  Edwards,  CerealsDB,  Graingene,  and  Rothamsted  

Summer  2014  

  Check  how  Dspace  can  be  indexed  with  Solr  WheatIS/TransPLANT  data  model  

URGI,  TGAC   Fall  2014  

  Develop  a  portal  to  query  the  Solr  servers   URGI   Summer  2014  

  Write  a  tutorial  to  explain  how  to  install  a  Solr  server  and  configure  it  with  the  WheatIS/TransPLANT  data  model.  A  virtual  machine  can  be  shared  if  useful  

URGI,  EBI   Summer  2014  

  Check  the  ranking  relevance  of  the  search  results,  according  to  wheat  (and  relative)  data  types  (see  WheatIS  survey)  

MIPS   Fall  2014  

  The  first  prototype  tested  by  users  to  improve  results  ranking  and  ergonomy.  This  could  be  organized  during  a  training  session  

All  Nodes   Winter  2015  

WP3   A  Cookbook  on  how  to  produce  “wheat  data”  that  are  easily  sharable,  reusable  and  interoperable  

RDA  group   Fall  2014  

  Concrete  application  of  the  recommendation  on  data  from  large  projects  such  as  SeeD,  T-­‐CAP,  BREEDWHEAT,  and  WISP  

All  nodes  with  their  users  

Winter  2015  

WP4   1st  iRODs  federation  experiment   URGI,  TGAC   Spring  2014  

  2nd  iRODs  federation  experiment   iPlant  (Doreen  Ware),  Qcloud  

Fall  2014  

 

5    

(Dave  Edwards)  

WP5   WheatIS  web  portal   Dave  Edwards  

Spring  2014  

  Updated  project  roadmap  document   Hadi  Quesneville  

Spring  2014  

  Logo  design  for  WheatIS   Ruth  Bastow  

Summer  2014  

WP6   Conference  call   Hadi  Quesneville  

April  2014  

  Working  meeting  @  ITMI   Hadi  Quesneville  

June  29  –  July  4,  2014  

  Annual  meeting   Hadi  Quesneville  

01/10/2015  -­‐  01/14/2015