Regulatory Genomics Lecture 1 November 2012 Yitzhak (Tzachi) Pilpel 1

Preview:

Citation preview

Regulatory Genomics

Lecture 1 November 2012

Yitzhak (Tzachi) Pilpel

Lecture 1 November 2012

Yitzhak (Tzachi) Pilpel1

Course requirements

• Attendance and participation

• Five reading assignments

• A final take home papers reading-based exam

• website

In total 13 or 14 meetings (not 17…)No meeting on Nov 15th

2

Genomics marked the beginning of a new age in

biology and medicine

1900

1953

1977

1980

1983

1990

1994-98

1998

2000

2005

Watson and Crick identify DNA(the double helix) as the Chemical basis of heredity

DNA markers used to map human disease genes to chromosomal regions

Human Genome Projects (HPG) begins-an international effort to map and sequence all the genes in the human genome

DNA markers used to map human disease genes to chromosomal regions

Release of Human Genome Project

Sanger and Gilbert derive methods of sequencing DNA

Huntington disease gene mapped to chromosome 4

Genetic and physical mapping

Working Draft of the human genome sequencing complete

Rediscovery of Mendel's laws helps establish the science of genetics

Source: Health Policy Research Bulletin, volume 1 issue2, September 2001

3

The genome browser

Link4

Number of protein coding genes20,210

19,735

13,601

5,616

20,568

482Mycoplasma genitalium

Mouse Fruit fly Mustered(Arabidopsis)

Worm (C elegans)

Yeast(S Cerevisiae)

5

How comes we have so few genes give that we are so complex???

19,735

21,710•We have many non-protein coding genes

•Our genes are longer and more complex

•Regulation of human genes activity is more complex

•Repeats (formerly known as “junk DNA” (yet not garbage) contribute to complexity

•Combinatorial interactions among genes and products

6

The hierarchical structure of the genome

Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.

7

Expressing the genome

8

The Central Dogma: a cellular context

915

DNA mRNA Protein

Inactive DNA

The Central Dogma of Molecular BiologyExpressing the genome

RNA

10

Evolution

11

Corrected view of evolution

12

The tree of life

13

How genomes evolve?

Consider two distinct possibilities:

•Genomes evolve by lots of de-novo “inventions”

•Genomes evolve predominantly by mixing and matching existings

parts 14

Classification of protein structures

15

Very slow growth in number of protein folds

Very few structural “inventions”

16

Comparing a certain family (e.g. kinases) in different species reveals few

“inventions”

17

Analogy:

•Technology

•Language

18

Some basic evolutionary operations

• Mutating existing DNA

• Change gene expression profiles

• Duplications of existing material (genes, chromosomes, genomes)

• Transfer of genes from one organism to another

• Functionalization of “junk DNA”

• Reverse transcription??19

Stress condition induce high DNA replication error rate

Because most newly arising mutations are neutral or deleterious, it has been argued that the mutation rate has evolved to be as low as possible, limited only by the cost of error-avoidance and error-correction mechanisms. But up to one per cent of natural bacterial isolates are 'mutator' clones that have high mutation rates. We consider here whether high mutation rates might play an important role in adaptive evolution. Models of large, asexual, clonal populations adapting to a new environment show that strong mutator genes (such as those that increase mutation rates by 1,000-fold) can accelerate adaptation, even if the mutator gene remains at a very low frequency (for example, 10[-5]). …

20

Some basic evolutionary operations

• Mutating existing DNA

• Change gene expression profiles

• Duplications of existing material (genes, chromosomes, genomes)

• Transfer of genes from one organism to another

• Functionalization of “junk DNA”

• Reverse transcription??21

A slight change in expression program can make a big change: olfactory receptor can “smell the egg”

22

Science. 2003 Mar 28;299(5615):2054-8.Identification of a testicular odorant receptor mediating human sperm chemotaxis.Spehr M, Gisselmann G, Poplawski A, Riffell JA, Wetzel CH, Zimmer RK, Hatt H.SourceDepartment of Cell Physiology, Ruhr University Bochum, 150 University Street, D-44780 Bochum, Germany.AbstractAlthough it has been known for some time that olfactory receptors (ORs) reside in spermatozoa, the function of these ORs is unknown. Here, we identified, cloned, and functionally expressed a previously undescribed human testicular OR, hOR17-4. With the use of ratiofluorometric imaging, Ca2+ signals were induced by a small subset of applied chemical stimuli, establishing the molecular receptive fields for the recombinantly expressed receptor in human embryonic kidney (HEK) 293 cells and the native receptor in human spermatozoa. Bourgeonal was a powerful agonist for both recombinant and native receptor types, as well as a strong chemoattractant in subsequent behavioral bioassays. In contrast, undecanal was a potent OR antagonist to bourgeonal and related compounds. Taken together, these results indicate that hOR17-4 functions in human sperm chemotaxis and may be a critical component of the fertilization process.

23

Some basic evolutionary operations

• Mutating existing DNA

• Change gene expression profiles

• Duplications of existing material (genes, chromosomes, genomes)

• Transfer of genes from one organism to another

• Functionalization of “junk DNA”

• Reverse transcription??24

nonfunctionalization

neofunctionalization subfunctionalization

duplication

Gene duplication might provide redundancy

25

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

chromosome III duplicates in heat

Gene Index

log 2(e

xpre

ssio

n ev

o39

/ evo

30)

all genes

chromosome III genes

P value < 10e-100

26

Heat shock tolerance correlates with chromosome III copy number

0

0.5

1

1.5

2

2.5

3

3.5

Rela

tive

Surv

ival

WT Two copies

Evolved 3 copies

WT One copyWT, 3 copies27

• Chromosomes are easily gained and lost in yeast evolution

• A more fine-tuned solution may follow chromosome duplication

• A sticking similarity between repetitive experiments

• A chromosome-condition specificity?

Conclusions from the experiment

28

29

Many gene duplicate distances Correspond to 60-70 mya!!

Sequences similarity between gene pairs

30

Some basic evolutionary operations

• Mutating existing DNA

• Change gene expression profiles

• Duplications of existing material (genes, chromosomes, genomes)

• Transfer of genes from one organism to another

• Functionalization of “junk DNA”

• Reverse transcription??31

Horizontal (“lateral”) gene transfer: transfer genes between organisms –

mostly in stress

32

Some basic evolutionary operations

• Mutating existing DNA

• Change gene expression profiles

• Duplications of existing material (genes, chromosomes, genomes)

• Transfer of genes from one organism to another

• Functionalization of “junk DNA”

• Reverse transcription??33

Evolution of transcriptional switches

Similar function

Neutral selection

Disrupted function

Low ratepurifying selection

TF1

TF2

Altered function

Low ratepurifying selection

TF1

Gained function

TF1

CACGCGTACACGCGTT

TF1

CACGAGTTCACGCGTT

CACACGTTCACGCGTTCACACGTTCACGCGTT

Low ratepurifying selection

34

Evolution of transcription networks

35

pilpel@weizmann.ac.il

36

Repetitive elements in the human genome

•Alu are repetitive retrotransposons elements in the Human genome. •Alu elements are about 300 base pairs long and are therefore classified as short interspersed elements (SINEs) •There are over one million Alu elements interspersed throughout the human genome•About 10% of the human genome consists of Alu sequences. 37

Retro-transposition

38

Alus may contain binding sites for TFs, microRNAs…

Alus

Alus 39

Can the phenotype shape the genotype?

Classical Darwinian theory

Lamarckian Theory

Genotype Phenotype

Genotype Phenotype

40

The Central Dogma: a cellular context

4115

42

cell division

celldeath

Attack a virus differentiate

proteinsynthesis

* * *

Cell membrane

Nucleus

43

* * *From parts to networks…

44

Reporter gene reveal spatio-temporal expression programs

45

In uni-cellulars response to environmental signals affect gene expression dramatically

Genes

Gasch et al Mol Biol Cell. 2000 Dec;11(12):4241-57.46

The transcriptome during the cell cycle

Spellman et al Mol Biol Cell. 1998 Dec;9(12):3273-9747

Coding DNA strandNon-coding strandRNA

48

Transcription regulation

• The hardware

• The software

• The input

• The output

49

The initiation machinery complex

50

Transcription factors bind the DNA

51

ATACGAT

Keys (regulators) can scan the genomes in search for their locks (recognition sites)

52

Transcription regulation

• The hardware

• The software

• The input

• The output

53

http://esg-www.mit.edu:8001/esgbio/pge/lac.html

In the absence of Lactose

54

http://esg-www.mit.edu:8001/esgbio/pge/lac.html

In the presence of Lactose

The Lac Operon (Jacob and Monod)

55

http://esg-www.mit.edu:8001/esgbio/pge/lac.html

In the absence of Glucose

56

The logic of the Lac operon regulation

CAPsite

Operator

Glu

cose

Lac

tose

+ -

- +

- -

+ +

Activity

OFF

ON

OFF

OFF

Lactose

n y

OFF

n y

OFF ON

Glucose

57

Genomic Regulatory Logic

58

DNA binding proteins for unique pathways

59

A global map of combinatorial expression control

mRPE72

SWI5

SFF '

MCM1

SFFMCM1'

ECB SCB

MCB

PAC

mRRPE

mRRSE3

GCN4

BAS1

LYS14

RAP1

mRPE34

mRPE57

mRPE6mRPE58

STRE

RPN4 ABF1

PDR

CCA

PHO4

AFT1

STE12

MIG1

CSRE

HAP234

ALPHA1'

ALPHA1

ALPHA2

mRPE8

mRPE69

Heat-shockCell cycleSporulationDiauxic shiftMAPK signalingDNA damage

*High connectivity

*Hubs*Alternative partners in various conditions

60

Transcription regulation

• The hardware

• The software

• The input

• The output

61

5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG

5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA

5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA

AlignACE ExampleAlignACE Example

…HIS7 …ARO4…ILV6…THR4…ARO1…HOM2…PRO3

300-600 bp of upstream sequence per gene are searched in

Saccharomyces cerevisiae.

62

5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG

5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA

5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA

AAAAGAGTCA

AAATGACTCA

AAGTGAGTCA

AAAAGAGTCA

GGATGAGTCA

AAATGAGTCA

GAATGAGTCA

AAAAGAGTCA

**********

AlignACE ExampleAlignACE Example

MAP score = 20.37

…HIS7

…ARO4

…ILV6

…THR4

…ARO1

…HOM2

…PRO3

The Best MotifThe Best Motif

63

Transcription regulation

• The hardware

• The software

• The input

• The output

64

Expression regulation of genes determines complex spatio-temporal patterns

65

Monitor expression during

cell cycle

0 5 10 15-2

-1

0

1

2

3

4

Time

mR

NA

exp

ress

ion

leve

l

G1 S G2 M G1 S G2 M 66

Time-point 1

Tim

e-po

int 3

Tim

e-po

int 2

-1.8

-1.3

-0.8

-0.3

0.2

0.7

1.2

1 2 3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

1 2 3

-1.5

-1

-0.5

0

0.5

1

1.5

1 2 3

Time -pointTime -point

Time -point

Nor

mal

ized

Exp

ress

ion

Nor

mal

ized

Exp

ress

ion

Nor

mal

ized

Exp

ress

ion

Genes can be clustered based on time-dependent expression profilesGenes can be clustered based on time-dependent expression profiles

67

The K-means algorithm

• Start with random positions of centroids.

Iteration = 0

68

K-means

• Start with random positions of centroids.

• Assign data points to centroids

Iteration = 1

69

K-means

• Start with random positions of centroids.

• Assign data points to centroids.

• Move centroids to center of assigned points.

Iteration = 1

70

K-means

• Start with random positions of centroids.

• Assign data points to centroids.

• Move centroids to center of assigned points.

• Iterate till minimal cost. Iteration = 3

71

The diauxic shift

Time 72

Genetic reprogramming of the yeast metabolism upon glucose deletion

73

Glucose

Pyruvate

2 ADP+Pi

2 ATP

NAD+

NADH

EthanolLactateFermentFerment

AcetylCoA

TCA

NAD+NADHRespirate

At the beginning – whenglucose is abundant

74

Glucose

Pyruvate

2 ADP+Pi

2 ATP

NAD+

NADH

EthanolLactateFermentFerment

O2O2

AcetylCoA

TCA

NAD+NADHRespirate

~20 hours laterwhen glucose is depleted

75

The promoter sequences of co-expressed genes

5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG

5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA

5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA

…HIS7 …ARO4…ILV6…THR4…ARO1…HOM2…PRO3

76

Promoter Motifs and expression

profilesCGGCCCCGCGGA

CTCCTCCCCCCCTTC TGGCCAATCA

ATGTACGGGTG

77

Recommended