View
217
Download
2
Tags:
Embed Size (px)
Citation preview
28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management
Thanks to:
Advancing Personal Genetics with Second Generation Sequencing
Context: Personal Genomics Landscapedirect-to-consumer -- hybrid -- research only
REVEAL
*
*
*
*
23andme
Over 600 alleles of BRCA1 (Myriad/DNAdirect sequencing not chips)
PersonalGenomes.org Project Goals
1) Low cost: <$1K : 98% exome (or more)2) Active subject participation, informed redaction3) Avoid over-promising de-identification 4) Entrance exam to ensure highly informed consent5) Multiple samples to ensure consistent IDs 6) Open access (not just researcher subset) 7) Trait questionnaire, stem cell RNA, biome 8) Cells available for personal functional genomics9) Scaleable to 100,000 diverse research subjects
0431
1070
1660
1677
1687
1781
1833
1846
Coriell GM2
•Employers/Insurers > Non-Discrimination Act•Actionable alleles are rare > all at risk•Non-actionable alleles > activism
1731
1E-4
1E-2
1E+0
1E+2
1E+4
1E+6
1E+8
1E+10
1E+12
1E+14
1840 1860 1880 1900 1920 1940 1960 1980 2000 2020
Daltons synth
Bits/sec
Seq bp/$
3 Exponential technologies3 to 18 month doubling times
Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Kurzweil 2002; Moore 1965
urea B12
tRNA
telegraph
Computation &Communication
Analytic
tRNA
Synthetic chemistry
human
Gbp chips
Chips vs. Gen-2 SequencingIllumina Affymetrix bead-array
Roche-454 Illumina
ABI-SOLiD
Harvard-DanaherPolonator-G007
Chips: 0.02% of the genome – assumes common DNA variants stay associated with deleterious variants over 50,000 years
Sequencing 98% genome accesses the deleterious variants directly
Helicos
G
A
C
T
Multiplex Cyclic Sequencing by Synthesis Single instrument, multiple chemistries: polonies on slides or beads
Polymerase -or- LigaseShendure,
Porreca, et al. 2005 Science
Illumina, IBS
AB-SOLiD, CGI
Mitra, et al. 2003 Analyt.
Biochem.1999NAR
36 to 64 flowcells (+ DNA barcodes)
2 to 4 billion beads
8.5 thicksequence image
Open-source hardware, software, wetware: Polonator G.007 (12TB image > 120 Gbp /run)
Enzyme/oligo kitsPolymerase or Ligase
chemistries$150K including
computer & 1 yr service,software, support
Danaher Inc.
Effect of improvements on cost
Improvement Factor Feature cost/run
Sequencing cost/run
Gb/run Reagent cost/Gb
Fold decrease
None 1 $1,677 $685 10 $292
Flowcell volume 5 $1,677 $137 10 $181 1.4
Useable yield 6 $1,677 $685 60 $39 6
Instrument speed 2 $3,354 $1,370 20 $236 1
Emulsion sorting 18 $93 $685 10 $78 3
Readlength 48bp 3.7 $1,677 $2,534 37 $114 2
ALL $186 $1014 444 $2.70 88
Polonator instrument 3 yr amortization: $150k / 300 runs = $500/run = $50/Gb $150k / 81 runs = $1850/run = $4.2/Gb
($10 vs. $2000 / Gb for other 2nd gen)
Personal genome sequencing options/goals
Technology Genome Cost Raw bpAB3730 98% $30M 7x = 42 Gb (3.5x each)Knome 98% $350K 15x = 84 GbSNP-chip 0.02% $1K 2 MbpPGP coding 1% $90 30x = 1GbPGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA)-path/resistome - $20 rRNA + 20K genesVDJ-Immunome - $20 ?
Selective genome sequencing
Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83.
Red=Synthetic; Yellow=genome/cDNA
How do we optimize >100K 100mers ?
8 ways to capture alleles from genomic or c-DNA
In vitro Paired-end-tags (PET)
Gap fill
Cleave& ligate
Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis, Nilsson, Church
For rearrangements
2. 3.
4. Hybr-select-chip 5. Hybr-select-solution 6. fluidic PCR 7. Multiplex PCR
1.
Circle Capture DNA from Chips
Aug 2007 R= .53 Jan 2008 R=.986
Zhang, Li et al. unpublished
Gap fill
Circle-capture 1% genome
Genome to Phenome: Population Variation
GA
TC
Zhang & Church unpublished
cis
Trans
Geneproducts
GeneExpression
Genome
Environment
Traits
G
A
TC
Allele-specific expression (ASE)
Combine all cis element variants
GA
AAAAAAAAAAAAAAAAAAAA
TC
TT
Enhancer, promoter, splicing, polyA, termination, transport, decay.Eliminate environmental & trans-acting variation among individuals.
G
A
GG
Allele-specific transcription factor
binding
TF
ChIP-Seq
Digital RNA allelotyping
Zhang, LI, Church unpublishedForton et al. Genome Res. 2007
Genomic DNA
Lymphocyte
cDNA
Lymphocyte
cDNA
Fibroblast
cDNA
Keratinocyte
rs1264899, ATP5F1, ATP synthase
T/C = 0.51 T/C = 3.47T/C = 3.73
Tissue specific & allele specific gene expression confirmatory assays
Kun Zhang & Alice Li
25X probe * 72X time =1800X Better efficiency.
Kun Zhang & Billy Li
Genomic DNA Aug 2007 Genomic DNA Jan 2008
cDNA Jan 2008
Challenge: Multiple cell types from healthy adults
3mm skin sample
PGPPhysiciansNetwork
Volunteers Induction of Multiple Gene Sets(not necessarily functional tissues)
Primary fibroblasts
Complex Traits via Allele-Specific Gene Expression
Induced Stem Cells
mRNA
MultiplexedDifferentiati
on
MultiplexedReprogramming
Sequence tag
quantitation
Jay Lee et al. unpublished
Induced Pluripotent Stem Cell Generation & Transdifferentiation (Oct4/Sox2/Myc/Klf4)
Retroviral Infection
Tissue Culture on a Mouse Feeder Layer
ES Cell Colony Identification
Clonal Isolation and Propagation
Embryoid Body Induction&
Guided Differentiation
Adenoviral Infection
Mixture of differentiated cell types
&Guided Differentiation
2 monthsMultiple integration sites
1 weekNo genomic integration
Yamanaka, Daley, ThomsonHochedlinger, Jaenisch labs
Lee & Church
Multiple cell-types with transdifferentiation
Retroviral InfectionAdenoviral Infection
MyoD
CD34
Collagen
Kun Zhang & Fan Liang
Green: phase contrast imageRed: Cy5-labelled Alu probe
Nunc or UCSD
Haplotyping by amplification of single chromosomes or fragments
• Ultra-clean conditions for reduction of background amplification + Real-Time monitoring
• Post-amplification chip hybridization distinguishes alleles
• Amplification variation random & easily filled by PCR
• error rate <1.7 10–5
Single-cell or Single DNA-fragment (haplotype) sequencing: 5 Mbp
Zhang et al. Nature Biotec 2006
Environments of Genomes
VDJ-ome
TRAITS
biome
RNAomePERSONAL GENOME
One in a life-time genome + yearly ( to daily) tests
Bio-weather map : Allergens, Microbes, Viruses
PGP Resistome: 18 Antibiotics
Dantas, Sommer, Churchunpublished
Bacteria Subsisting on 18 Antibiotics
DantasSommerChurchScience
2008
Personal genome sequencing options/goals
Technology Genome Cost Raw bpAB3730 98% $30M 7x = 42 Gb (3.5x each)Knome 98% $350K 15x = 84 GbSNP-chip 0.02% $1K 2 MbpPGP coding 1% $90 30x = 1GbPGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA)-path/resistome - $20 rRNA + 20K genesVDJ-Immunome - $20 ?