NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin

NextGen Pipeline: Enabling the Plant Science Community

Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead)

Ed Buckler, Justin Borevitz, Todd Mockler, Pat Schnable, Bob Schmitz, Matt Hudson, Brad Barbazuk, Damian Gessler

•Ultra high-throughput sequence analysis (UHTS)

•Several platforms including 454, ABI-Solid, Illumina/Solexa that are capable of generating 1 to 100’s of Gb of DNA sequence on a single run.

•Library preparations are relatively simple and kits available

•Data analysis is computationally challenging (need to process Tb of data) and beyond the reach of many experimental biologists.

What is NextGen?

UHTS-RNA

UHTS-DNA

•Makes phenotyping not genotyping rate limiting•Genome-wide association studies

•Allele-mining

•Enables a much deeper understanding of “non-model” species •1000 genomes project (transcriptome of 1000 plant species)

•Genome sequence now available for B. distachyon, S. italica genomes, RILs of maize and rice

•Provides detailed transcriptional resolution on global scale•Map 5’, 3’ UTR, TSS, transcript isoforms,

•Examine smRNA populations

•Map methylation, TF binding sites, etc…

How will UHTS change plant science?

•Develop an a computational pipeline to process ultra-high throughput sequence datasets

•First iteration of NextGen 1.0 Pipeline will perform simple variant detection or transcript quantification starting from DNA and RNA-derived datasets.•Designed explicitly to support modularity and extensibility

• Import fastq files and export data in SAM/BAM format.

NextGen 1.0 Pipeline

•Subsequent versions will have added functionalities that may include:•Ability to process/compare multiple samples

•Support varient detection for non-reference genomes

•Support multiple methods of analysis (BWA,SOAP2/BOWTIE)

•Support additional workflows (smRNA annotation, ChIP seq, de novo assembly)

• Input from working groups is imperative•What is the decision tree for subsequent iterations?

•What do modeling/stats/viz groups need as NextGen deliverables?

•How can NextGen exploit tools under development for G2P?

NextGen 2.0 Pipeline

•Flowering time and photosynthesis •How can NextGen inform modeling efforts

•Abiotic Stress•Should we develop a smRNA pipeline for 2.0

• Input from working groups is imperative•What is the decision tree for subsequent iterations?

•What do modeling/stats/viz groups need as NextGen deliverables?

•How can NextGen exploit tools under development for G2P?

Meeting the needs of biological use cases

Integrating NextGen/Viz Pipeline

Workflow

• A pathway of operations• Entities:–Operation –Data

–Flow

• The flow through the operations is managed by the workflow software (e.g., VizTrails)

• Candidate software and package are named

/ber=Bernice Rogowitz

Integrating NextGen/Viz/Modeling Pipelines

List of 20 homogolous maize gene IDs

List of 20 homogolous maize gene IDs

Find expression values for these genes (e.g, Next

Gen)

Find expression values for these genes (e.g, Next

Gen)

For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome

Browser)

For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome

Browser)

5 genes of interest 5 genes of interest

List of homologous Arabadopsis gene IDs

List of homologous Arabadopsis gene IDs

Modeling and Statistical Inference

Modeling and Statistical Inference

Literature searchLiterature searchHomolog Finder (e.g,

CoGE)Homolog Finder (e.g,

CoGE)Candidate maize

gene Candidate maize

gene

Co-Expression Analysis (e.g., ATTED2)

Co-Expression Analysis (e.g., ATTED2)

Expression Network of 10 Arabidopsis Genes Expression Network of 10 Arabidopsis Genes

Homolog Finder (e.g, CoGE)

Homolog Finder (e.g, CoGE)

Expression data for 20 maize genes

Expression data for 20 maize genes

/ber/tb

Examine clusters that can handle maize data

(e.g., eNorthern, MapMan)

Examine clusters that can handle maize data

(e.g., eNorthern, MapMan)

note: very limited data for maize so may need to go

to rice)

iterate

itera

te

Documents

NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin