Upload
domenic-caldwell
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
NextGen Pipeline: Enabling the Plant Science Community
Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead)
Ed Buckler, Justin Borevitz, Todd Mockler, Pat Schnable, Bob Schmitz, Matt Hudson, Brad Barbazuk, Damian Gessler
•Ultra high-throughput sequence analysis (UHTS)
•Several platforms including 454, ABI-Solid, Illumina/Solexa that are capable of generating 1 to 100’s of Gb of DNA sequence on a single run.
•Library preparations are relatively simple and kits available
•Data analysis is computationally challenging (need to process Tb of data) and beyond the reach of many experimental biologists.
What is NextGen?
UHTS-RNA
UHTS-DNA
•Makes phenotyping not genotyping rate limiting•Genome-wide association studies
•Allele-mining
•Enables a much deeper understanding of “non-model” species •1000 genomes project (transcriptome of 1000 plant species)
•Genome sequence now available for B. distachyon, S. italica genomes, RILs of maize and rice
•Provides detailed transcriptional resolution on global scale•Map 5’, 3’ UTR, TSS, transcript isoforms,
•Examine smRNA populations
•Map methylation, TF binding sites, etc…
How will UHTS change plant science?
•Develop an a computational pipeline to process ultra-high throughput sequence datasets
•First iteration of NextGen 1.0 Pipeline will perform simple variant detection or transcript quantification starting from DNA and RNA-derived datasets.•Designed explicitly to support modularity and extensibility
• Import fastq files and export data in SAM/BAM format.
NextGen 1.0 Pipeline
•Subsequent versions will have added functionalities that may include:•Ability to process/compare multiple samples
•Support varient detection for non-reference genomes
•Support multiple methods of analysis (BWA,SOAP2/BOWTIE)
•Support additional workflows (smRNA annotation, ChIP seq, de novo assembly)
• Input from working groups is imperative•What is the decision tree for subsequent iterations?
•What do modeling/stats/viz groups need as NextGen deliverables?
•How can NextGen exploit tools under development for G2P?
NextGen 2.0 Pipeline
•Flowering time and photosynthesis •How can NextGen inform modeling efforts
•Abiotic Stress•Should we develop a smRNA pipeline for 2.0
• Input from working groups is imperative•What is the decision tree for subsequent iterations?
•What do modeling/stats/viz groups need as NextGen deliverables?
•How can NextGen exploit tools under development for G2P?
Meeting the needs of biological use cases
Integrating NextGen/Viz Pipeline
Workflow
• A pathway of operations• Entities:–Operation –Data
–Flow
• The flow through the operations is managed by the workflow software (e.g., VizTrails)
• Candidate software and package are named
/ber=Bernice Rogowitz
Integrating NextGen/Viz/Modeling Pipelines
List of 20 homogolous maize gene IDs
List of 20 homogolous maize gene IDs
Find expression values for these genes (e.g, Next
Gen)
Find expression values for these genes (e.g, Next
Gen)
For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome
Browser)
For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome
Browser)
5 genes of interest 5 genes of interest
List of homologous Arabadopsis gene IDs
List of homologous Arabadopsis gene IDs
Modeling and Statistical Inference
Modeling and Statistical Inference
Literature searchLiterature searchHomolog Finder (e.g,
CoGE)Homolog Finder (e.g,
CoGE)Candidate maize
gene Candidate maize
gene
Co-Expression Analysis (e.g., ATTED2)
Co-Expression Analysis (e.g., ATTED2)
Expression Network of 10 Arabidopsis Genes Expression Network of 10 Arabidopsis Genes
Homolog Finder (e.g, CoGE)
Homolog Finder (e.g, CoGE)
Expression data for 20 maize genes
Expression data for 20 maize genes
/ber/tb
Examine clusters that can handle maize data
(e.g., eNorthern, MapMan)
Examine clusters that can handle maize data
(e.g., eNorthern, MapMan)
note: very limited data for maize so may need to go
to rice)
iterate
itera
te