40
Genome Function Project e thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenis orporate collaborators & sponsors: ffymetrix, GTC, Mosaic, Aventis, Dupont UCSC George Church 24 Aug 2001

Genome Function Project We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Genome Function Project

We thank for support:

Government and private grant agencies: NHLBI,

NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise

Corporate collaborators & sponsors:

Affymetrix, GTC, Mosaic, Aventis, Dupont

UCSCGeorge Church 24 Aug 2001

gcggatttagctcagttgggag agcgccagact gaagatttgga ggtcctgtgtt cgatccacagaattcgcacca

Post-StructuralGenomics

Data

Post-300 Genome Sequences

0.5 to 7 Mbp 10 Mbp to 1000 Gbp

figure

DNA RNA Protein

Metabolites

Growth rateExpression

Interactions

Environment

Function Genomics Measures & Models

Exponential technologies

1993 first browser 1994 commercial www

Agenda

1. mapping human variation (haplotype map)

2. obtaining a complete and validated set of human genes including - multiple alleles, transcripts, protein or structural RNA products - regulatory elements

3. understanding the diversity of life through genomic analysis of manyorganisms, and understanding how one organism works by comparativegenomics with others - how genomes evolved

4. creating a new quantitative systems biology, beyond drawing circlesand arrows on paper and labeling them with names nobody can remember - mapping the key interactions - mathematical/computational models of pathways and systems - dealing with multiple levels from atoms to cells

In vitro minigenomeSteve Blackwell, HMS: pure IF, EFTony Forster, BWH: tRNAs & modified basesManz Ehrenberg, Dieter Soll : tRNA-synthetasesJosh LaBaer, HMS-HIP: Expression constructsJingdong Tian, HMS: Protein synthesisRob Mitra & Xiaohua Huang, HMS: Polymerases, RCAGloria Culver, Iowa State: ribosomal proteins & rRNAHarry Noller, UCSC: ribosomes

In vitro minigenome A) From atoms to evolving minigenomes and cells.This could improve in vitro transcription/translation/replication systems and conceptually link atomic (mutational) changes via molecular and systems modeling to population evolution. The synthesis of pure systems of proteins with natural or novel modifications would be or great significance. This could give an incredible focus to structural genomics. B) From cells to tissues.Modeling the effects of combinations of membrane signals and genome-programming on RNA and protein expression profiles, would allow, among other things, manipulating stem-cell fate and stability. Stability would be key to both cell culture and to long-term avoidance of cancerous stem-cell proliferation. The ability of "programmed" cells to replace or augment small molecule drugs could be rigorously assessed. C) From tissues to systemsComputational programming of cell and tissue morphology can develop quantitative concepts in complexity, chaos, robustness, evolvability to engineer useful models such as sensor-effector neural feedback systems where macro aspects of the system determine the past (Darwinian) or future (prosthetic) function of the altered genomes.

Grand Challenges: goals (& details)

• The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere)

• The Apollo Project ’62-69: Send a person to the moon (& back)

• The Smallpox Eradication ’66-77: from the whole globe (including freezers)

• The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy & searchable)

Grand Challenges: goals (& details)

• The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere)

• The Apollo Project ’62-69: Send a person to the moon (& back)

• The Smallpox Eradication ’66-77: from the whole globe (including military freezers?)

• The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy with comparisons)

• The BioSystems Project ’02- ??

Potential BioSystems Project Challenges

Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNAProgramming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populationsProgramming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells

Potential BioSystems Project Challenges

Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNAProgramming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populationsProgramming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells

Why the genome project worked

Hood’75-00, Hunkapiller’77-00,

Carruthers’79... Polymer synthesis &

sequencing

Shotgun & mappingSanger’77, Brenner’72-02, Sulston’90, Olson’80-00...

Ulam’61-74, Staden’79, Lipman’87, Myers’87,

Green’93...Sequence searching

Tabor’93, Karger’94, Mathies’96, Mullis’84... Chemistry

InfrastructureWada’82, DeLisi’84, Gilbert’87, Watson’88, Venter’91...

Automate Data Model Similarity quality quality search

X-ray 1960 resolution |o-c|/o DALI,etc.diffraction < 0.2nm R < 0.2

Sequence 1988 discrepancy conserved BLAST bp <0.01% proteins

Metrics for structural & functional data

Expression 1999 cc, t-test shared motifs, Biclustering shared function

Interact/growth outliers optimality as above?

Types of Systems Interaction Models

Quantum Electrodynamics subatomicQuantum mechanics electron cloudsMolecular mechanics spherical atoms nm-fsMaster equations stochastic single molecules Fokker-Planck approx. stochasticMacroscopic rates ODE Concentration & time (C,t) Flux Balance Optima dCik/dt optimal steady state Thermodynamic models dCik/dt = 0 k reversible reactions

Steady State dCik/dt = 0 (sum k reactions) Metabolic Control Analysis d(dCik/dt)/dCj (i = chem.species) Spatially inhomogenous dCi/dx Population dynamics as above km-yr

Increasing scope, decreasing resolution

Capillary electrophoresis $300,000(DNA Sequencing) : 0.4Mb/day

Chromatography-Mass Spectrometry (eg. peptide LC-ESI-MS) : 20Mb/day

Microarray scanners (eg. RNA) : 300 Mb/day mpg

Reagent costs: mpg

Electrophoresis (DNA Sequencing) : 10 ul per 0.5 KbMicroarray reactions: 10 ul per 1000 Kb

Intel cmosmicroscope$99

Sources of Data for BioSystems Modeling:

RNA quantitation Aach, Rindone, Church, (2000) Genome Research 10: 431-445.

• Microarrays1

• Affymetrix2

• SAGE3

experiment

control • R/G ratios

• R, G values

• quality indicators

ORF

ORF

PMMM

• Averaged PM-MM

• “presence”

• feature statistics

• 25-mers

• Counts of SAGE 14-mers sequence tags for each ORF

ORF SAGE Tag

concatamers

1 DeRisi, et.al., Science 278:680-686 (1997)2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)3 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)

Array opportunities

• 22 bp ds-RNAi array modulates single cell type• Drug array time-release or photo-release• Primer pair arrays for haplotyping• Gene & genome synthesis (DARPA)

Polypeptide arrays

Photo-deprotect peptides (Affymax)Piezo or contact spotting (Harvard-CGR, Stanford)Phage or ribosome display capture (Bulyk)In situ ribosomal synthesis (Tian)

Harvard Inst. Proteomics, FLEXGene consortium

A’

A’A’

A’

A’

A’

B

BB

B

BB

A

Single Molecule From Library

B

BA’

A’

1st Round of PCR

Primer is Extendedby Polymerase

B

A’

BA’

Primer A has 5’ immobilizing (Acrydite) modification.

1. Remove 1 strand of DNA.2. Hybridize Universal Primer.3. Add Red (Cy3) dTTP.

B B’

3’ 5’

AGT..

T

4. Wash; Scan Red Channel

Sequence polonies by sequential,fluorescent single-base extensions

B B’

3’ 5’

GCG..

5. Add Green (FITC) dCTP

6. Wash; Scan Green Channel

B B’

3’ 5’

AGT.

T

Sequence polonies by sequential, fluorescent single-base extensions

C

B B’

3’ 5’

GCG..

C

Polony Template

3’ P’

P5’ A ATA CAA TTCACACAGGAAACAGCTATGA CATT CTATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’

FITC ( C ) CY3 ( T )

Primer Extension 26 cycles, 34 Nucleotides

Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43

Polony haplotyping

Trans Cis

DNA RNA Protein

Metabolites

Growth rate

Environment

Function Genomics Measures & Models

microbesstem cellscancer cellsmulticellular organisms

RNAiInsertionsSNPs

Competition among multiple mutations & multiple homologous domains

 

1 2 3

1 2 3

thrA

metL

1.1 6.7

1.8 1.8

1 2lysC

10.4

 

  

probes

Selective disadvantage in minimal media

Multiple mutations per gene

Correlation between two selection experiments

Comparison of selection data with FBO predictions(scale up from79 to 488 genes)

predictions number of genes

negatively selected

not negatively selected

essential 143 80 63

reduced growth rate

46 24 22

non essential

299 119 180

P-value Chi Square = 0.004

>

<

Novelduplicates?

Positioneffects?

DNA RNA Protein

Metabolites

Expression

Environment

Function Genomics Measures & Models

RNA quantitation(Frequently Asked Questions)

Is less than a 2-fold RNA-ratio ever important? Yes; 1.5-fold in trisomies.

Why oligonucleotides rather than cDNAs? Alternative RNAs, gene families.

Using a subset of the genomeor ratios to various control RNAs?Trouble for later (meta) analyses.

Lpp mRNA start & structure

-1

-0.5

0

0.5

1

1.5

2

-300 -200 -100 0 100 200 300 400

Bases from Translation Start

Inte

ns

ity

(P

M -

MM

) / S

ma

x

Log

Stationary

Genomic DNA

KnownHairpin

Translation Stop(237 bases)

Known Transcription Start(position -33)

See: Selinger et al Nat Biotech

Oligo selection

• PGA/Smith group already designing software for oligo selection• Church Lab / Lipper Center has additional tools

– Unique oligos (cu-15s)– RNA string matching program

gene-specificoligos

controls, text, border oligos

gene sequences

parameters(Tm, length, ...)

generate candidate

oligos

background sequences

predict cross-hybridization

filter & select oligos

generate chip layout

experimental results

generate control, border oligos chip layout

Figure courtesy of Adnan Derti

Combinatorial arrays for binding constants(EGR1)

ds-DNA ds-DNA arrayarray

HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen ChooMRC: Yen Choo

Combinatorial arrays for binding constants

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Phage

pVIIIpVIII

pIIIpIII

Antibodies

Combinatorial arrays for binding constants

PhycoerythrinPhycoerythrin- 2º IgG- 2º IgG

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Martha Bulyk et alMartha Bulyk et al

Phage

Interactions of Adjacent Basepairs in EGR1 Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA RecognitionZinc Finger DNA Recognition

Isalan et al., Biochemistry (‘98) 37:12026-12033

Wildtype EGR1 MicroarrayWildtype EGR1 Microarray

high [DNA](+) ctrl sequence

for wt binding

alignment oligos

etc.

WildtypeWildtypeRSDHLTTRSDHLTT

Motifs weight all 64 Kaapp

RGPDLARRGPDLARREDVLIRREDVLIR

LRHNLETLRHNLET

TGG 2.8 nM

GCG 16 nM

2.5 nM

TAT 5.7 nM

AAA,AAT,ACT,AGA,AGC,AGT,CAT,CCT,CGA,CTT,TTC,TTT

AAT 240 nM

KASNLVSKASNLVS

For more information:

arep.med.harvard.edu