GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center.

  • Published on
    16-Dec-2015

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

<ul><li> Slide 1 </li> <li> GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center </li> <li> Slide 2 </li> <li> Metadata Inconsistencies Each project was providing different types of metadata No consistent nomenclature being used Impossible to perform reliable comparative genomics analysis </li> <li> Slide 3 </li> <li> Dengue Clinical Metadata </li> <li> Slide 4 </li> <li> Virus Isolate Information </li> <li> Slide 5 </li> <li> Complex Query Interface </li> <li> Slide 6 </li> <li> Additional Clinical Characteristics </li> <li> Slide 7 </li> <li> GSC-BRC Metadata Standards Working Group NIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programs Develop metadata standards for pathogen isolate sequencing projects </li> <li> Slide 8 </li> <li> Metadata Standards Process Divide into pathogen subgroups viruses, bacteria, eukaryotic pathogens and vectors Collect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS) Identify data fields that appear to be common across projects within a pathogen subgroup (core) and data fields that appear to be project specific For each data field, provide definitions, synonyms, allowed value sets preferably using controlled vocabularies, expected syntax, examples, data categories and data providers Merge subgroup core elements into a common set of core metadata fields and attributes Assemble metadata fields into a semantic network Harmonize semantic network with the Ontology of Biomedical Investigation (OBI) Compare, harmonize, map to other relevant initiatives, including MIGS, MIMS, BioProjects, BioSamples Develop data submission spreadsheets to be used for all white paper and BRC- associated projects </li> <li> Slide 9 </li> <li> GSC-BRC Metadata Working Groups </li> <li> Slide 10 </li> <li> Example Metadata </li> <li> Slide 11 </li> <li> Virus Core Metadata Sheet </li> <li> Slide 12 </li> <li> Metadata Merge </li> <li> Slide 13 </li> <li> data transformations image processing assembly sequencing assay specimen source organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations has_input has_quality instance_of temporal-spatial region located_in Network Overview </li> <li> Slide 14 </li> <li> data transformations image processing assembly sequencing assay specimen source organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes has_input has_quality instance_of temporal-spatial region located_in Specimen Isolation Material Processing Data Processing Sequencing Assay Investigation </li> <li> Slide 15 </li> <li> Metadata Categories Investigation Host/Source Characterization Specimen Isolation Pathogen Detection Pathogen Isolation Pathogen Characterization Specimen Processing Sample Shipment Sequencing Sample Preparation Sequencing Assay Data Transformation </li> <li> Slide 16 </li> <li> organism environmental material specimen source role species/ strain organism ID age, gender, symptom specimen isolation procedure X has_input plays common name denotes has_qualityinstance_of v10 v12 v11 v13 Host/Source Characterization temporal-spatial region spatial region temporal interval GPS location date/time has_part denotes spatial region geographic location denotes located_in vX row X in virus sheet - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations b14 b15 b16 b17 b19 b20 </li> <li> Slide 17 </li> <li> organism environmental material equipment person specimen source role specimen capture role specimen collector role temporal-spatial region spatial region temporal interval GPS location date/time specimen X specimen isolation procedure X isolation protocol has_input has_output plays has_specification has_part denotes located_in name denotes spatial region geographic location denotes located_in affiliation has_affiliation ID v2 v5-6 v3-4 v7 v8 v15 v16 denotes specimen type instance_of specimen isolation procedure type instance_of Specimen Isolation plays has_input Comments ???? v9 organism part hypothesis v17 is_about IRB/IACUC approval has_authorization v19v18 b18 b22 environment has_quality b23 b24 b28 b29 b25 b26 b27 b30 </li> <li> Slide 18 </li> <li> temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Pathogen Detection pathogen detection process X has_input has_specification data about pathogen presence specimen type amount denotes instance_of has_quality located_in pathogen detection method instance_of denotes pathogen detection protocol has_output v28 is_about b21 </li> <li> Slide 19 </li> <li> specimen X microorganism X has_part species/ strain instance_of ID v15 v16 Pathogen Isolation specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output v26 </li> <li> Slide 20 </li> <li> specimen X microorganism X has_part species/ strain instance_of ID v15 v16 v27 Pathogen Characterization specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output b2 b3 b4 biological characteristic assay X antigenic characteristic assay X pathologic characteristic assay X genetic characteristic assay X chromosome/plasmid assay X biovar characteristic serovar characteristic pathovar characteristic genotype characteristic chromosome/plasmid characteristic antibiotic sensitivity assay X antibody sensitivity characteristic has_input is_about genus/species/strain determination assay X genus/species/strain characteristic b5 b6 b7 b8 b11 b13 b10 b9 b12 has_output v27 v29 v30 v31 v32 </li> <li> Slide 21 </li> <li> temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X sample set X sample set assembly process X sample set assembly protocol has_output has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Specimen Processing aliquoting process X aliquoting protocol has_input has_output has_specification specimen X aliquot Y specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in sample set assembly process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes specimen A aliquot B specimen M aliquot N specimen T aliquot U has_input v20 v22 v23 b40 repository specimen X ID specimen type information record denotes instance_of has_quality repository deposition process X has_input has_output specimen repository located_in b41 b43 b42 </li> <li> Slide 22 </li> <li> sample set X at GSC sample set X in transit sample shipment process X sample shipment protocol sample receipt process X sample receipt protocol has_input has_output has_specification Sample Shipment sample set X ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality located_in sample shipment process sample receipt process instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v21 sample X at GSC ID sample type amount denotes instance_of has_quality has_part v24 v25 </li> <li> Slide 23 </li> <li> temporal-spatial region spatial region temporal interval GPS location date/time NA amplified sample X specimen X microorganism X enriched NA sample X microorganism genomic NA NA enrichment process X NA enrichment protocol NA amplification process X NA amplification protocol has_input has_output has_part has_specification has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Sequencing Sample Preparation aliquoting process X aliquoting protocol has_input has_output has_specification specimen aliquot X specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in NA enrichment process NA amplification process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v35 v36 v37 v38 v39 v33 b31 b32 library construction protocol b33 </li> <li> Slide 24 </li> <li> sequencing assay X sample material X person X equipment X lot # primary data sequencing protocol temporal-spatial region has_input located_in has_specification has_output v40 plays spatial region temporal interval GPS location date/time spatial region geographic location Sequencing Assay has_part located_in denotes run ID sequencing assay type denotes insatnce_of reagent role reagent type instance_of denotes sample ID plays template role sample type instance_of denotes name plays sequencing tech. role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input v14 v41 objectives coverage, genome type targeted, finishing has_part b34 b38 </li> <li> Slide 25 </li> <li> data transformations image processing assembly X data transformations variant detection primary data sequence data genotype data microorganism X microorganism genomic NA algorithm data archiving process sequence data record has_input instance_of has_specification has_input has_output is_about GenBank ID denotes software has_input data transfer protocol has_specification species/ strain has_output has_input temporal-spatial region located_in spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes person X name plays bioinformatics tech. role species instance_of denotes run ID denotes located_in data transformations serotype marker detection serotype data data transformations gene detection gene data part_of has_output is_about has_input Data Transformations temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes v29 v43 v31 v32 v42 v30 v44 v45 v46 v47 b35 b36 finishing status has_quality b37 b39 </li> <li> Slide 26 </li> <li> assay X sample material X person X equipment X lot # primary data assay protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Assay has_part located_in denotes run ID assay type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part analyte X has_part quality x has_quality input sample material X is_about </li> <li> Slide 27 </li> <li> material transformation X sample material X person X equipment X lot # output material X material transformation protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Material Transformation has_part located_in denotes run ID material transformation type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part quality x has_quality quality x material type has_quality instance_of sample ID denotes </li> <li> Slide 28 </li> <li> data transformation X input data output data material X algorithm has_specification has_output is_about software has_input located_in person X name data analyst role denotes run ID denotes Generic Data Transformation temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes data transformation type instance_of plays </li> <li> Slide 29 </li> <li> Generic Material (IC) material X ID material type quality x has_quality material Y has_part material Z has_part quality y has_quality denotes instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in s...</li></ul>

Recommended

View more >