61
DNA in chromatin : how to extract structural, dynamical and functional information from the analysis of genomic sequences using space- scale wavelet techniques Alain Arneodo Laboratoire de Physique, Ecole Normale Supérieure de Lyon 46 allée d’Italie, 69364 Lyon Cedex 07, FRANCE Françoise Argoul Benjamin Audit Samuel Nicolay ENS de Lyon, France Edward-Benedict Brodie of Brodie Cédric Vaillant EPF Lausanne, Switzerland Marie Touchon Yves d’Aubenton-Carafa CGM, Gif-sur-Yvette, France Claude Thermes

DNA in chromatin

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DNA in chromatin

DNA in chromatin :how to extract structural, dynamical and functional information from the analysis

of genomic sequences using space-scale wavelet techniques

Alain ArneodoLaboratoire de Physique, Ecole Normale Supérieure de Lyon

46 allée d’Italie, 69364 Lyon Cedex 07, FRANCE

Françoise ArgoulBenjamin AuditSamuel Nicolay ENS de Lyon, France

Edward-Benedict Brodie of Brodie

Cédric Vaillant EPF Lausanne, Switzerland

Marie Touchon Yves d’Aubenton-Carafa CGM, Gif-sur-Yvette, France

Claude Thermes

Page 2: DNA in chromatin

Report Documentation Page Form ApprovedOMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.

1. REPORT DATE 07 JAN 2005

2. REPORT TYPE N/A

3. DATES COVERED -

4. TITLE AND SUBTITLE DNA in chromatin:how to extract structural, dynamical and functionalinformation from the analysis of genomic sequences using space-scalewavelet techniques

5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Laboratoire de Physique,EcoleNormale Supérieure de Lyon46 alléedItalie, 69364 Lyon Cedex 07, FRANCE

8. PERFORMING ORGANIZATIONREPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited

13. SUPPLEMENTARY NOTES See also ADM001750, Wavelets and Multifractal Analysis (WAMA) Workshop held on 19-31 July 2004.,The original document contains color images.

14. ABSTRACT

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

UU

18. NUMBEROF PAGES

60

19a. NAME OFRESPONSIBLE PERSON

a. REPORT unclassified

b. ABSTRACT unclassified

c. THIS PAGE unclassified

Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Page 3: DNA in chromatin
Page 4: DNA in chromatin

DeoxyriboNucleic Acid

A

GC

T

G C

T A

: :

• Double helix macromolecule

• Each strand consists of an oriented sequence of four possible nucleotides:Adenine, Thymine, Guanine & Cytosine

• Complementary strands:[A]=[T] & [G]=[C] over the sum of both strands

Page 5: DNA in chromatin
Page 6: DNA in chromatin

Sequencing projects result in 4 letter texts :

Page 7: DNA in chromatin

NET RESULT : EACH DNA MOLECULE HAS BEEN PACKAGED INTO A MITOTIC CHROMOSOME THAT IS

50.000x SHORTER THAN ITS EXTENDED LENGTH

HIERARCHICAL STRUCTURE OF EUCARYOTIC DNA

Page 8: DNA in chromatin
Page 9: DNA in chromatin
Page 10: DNA in chromatin
Page 11: DNA in chromatin
Page 12: DNA in chromatin
Page 13: DNA in chromatin

FRACTALS SIGNALS

Turbulent velocity signal

Brownian signal ‘‘ 1/f noise’’

Medical signal

Financial time series

V(t)

time

S(t)

time

time

Heart rate

days

Marketprices

FRACTAL SIGNALS

Page 14: DNA in chromatin

ROUGHNESS EXPONENT

L0

Sf (k) ~ k –(2H+1)

Cf(l) = < f (x) f(x+l) > - < f (x) > 2 ~ l2H

W(L) = < f 2 (x) > - < f (x) > 2 ~ LH

H = roughness exponent Df = 2 - H

• Root-mean square of the height fluctuations :

• Random walk

• Power spectrum

• Correlation function

• 0.5 < H < 1 LONG RANGE CORRELATIONS (LRC)

• H = 0.5 UNCORRELATED

• 0 < H < 0.5 ANTI-CORRELATIONS

f

x

Page 15: DNA in chromatin
Page 16: DNA in chromatin

WAVELET ANALYSIS OF FRACTAL SIGNALS

The wavelet transform allows us to LOCATE (b) the singularities of f and to ESTIMATE (a) their strength h(x)(Hölder exponent)

Mathematical microscope

‘‘ Singularity scanner’’

g(x) : optics

b : position

a-1 : magnification

a

b

Tg (a,b) = g* f(x) dx∫ ⎟⎠⎞⎜

⎝⎛ −

x b

aa1

0.0 x 1.0

1.58

W2(x)

-1.22

Page 17: DNA in chromatin

CONTINUOUS WAVELET TRANSFORM OF THE TRIADIC DEVIL’S STAIRCASE

WAVELET TRANSFORM MODULUS MAXIMA

(WTMM)

WAVELET TRANSFORM REPRESENTATION

THE DEVIL’S STAIRCASE

WTMM SKELETON

F(x) is continuous but non differentiable. F’(x)=0 almost everywhere. Its continuous variation occurs over a set of Lebesgue measure = 0 and dimension DF = log 2 / log 3

WTMM SKELETON OF THE TRIADIC CANTOR SET

F(x) = dµ(x) ∫ ∞−x

Page 18: DNA in chromatin

Fractal measures• Invariant measures associated with the strange attractors of discrete dynamical systems• Turbulent energy dissipation

TRIADIC CANTOR SET

Fractal signals• Weierstrass functions• Fractional Brownian motions• Turbulent signals

F(x) is continuous but non differentiable. F’(x)=0 almost everywhere.Its continuous variation occurs over a set of Lebesgue measure = 0and dimension DF = log 2 / log 3

DEVIL’S STAIRCASECharacteristicfunction of µ

F(x) = dµ(x) ∫ ∞−x

UNIFORMp1 = p2 = ½

MULTIFRACTAL

p1 ≠ p2

p1 p2

p12 p1p2 p2p1 p22

0 1

Page 19: DNA in chromatin
Page 20: DNA in chromatin

SYNTHETIC DNA SEQUENCES

nn

Page 21: DNA in chromatin

Fractional Brownian motions : BH

SYNTHETIC DNA WALKS

H = 0.3 anti-correlated

H = 0.5 uncorrelated

H = 0.7 long-range correlated

H = 0.9 long-range correlated

n

Page 22: DNA in chromatin
Page 23: DNA in chromatin

H=0.8

H=0.5

Page 24: DNA in chromatin

G + C poor G + C rich

Page 25: DNA in chromatin
Page 26: DNA in chromatin

HIERARCHICAL STRUCTURE OF EUCARYOTIC DNA

Page 27: DNA in chromatin
Page 28: DNA in chromatin
Page 29: DNA in chromatin
Page 30: DNA in chromatin
Page 31: DNA in chromatin
Page 32: DNA in chromatin
Page 33: DNA in chromatin
Page 34: DNA in chromatin

AFM visualisation of a reconstituted chromatin fiber

Pierre-Louis Porté, Emeline Fontaine, Cendrine Moskalenko

Images obtained in ‘Tapping Mode’ in air

Page 35: DNA in chromatin

Linear DNA (2500 bp) positioning nucleosomes

Image obtained in ‘Tapping Mode’ in air

Page 36: DNA in chromatin

Linear DNA (2500 bp) positioning nucleosomes

Image obtained in ‘Tapping Mode’ in air

Page 37: DNA in chromatin

Plasmid DNA (3200 bp) + nucleosomes

Images obtained in ‘Tapping Mode’ in air

Page 38: DNA in chromatin

Plasmid DNA (3200 bp) + nucleosomes

0 100 200 300 400-0.5

0

0.5

1

1.5

2

s (nm)

z (n

m)

0 100 200 300 400-0.5

0

0.5

1

1.5

2

s (nm)

z (n

m)

Images obtained in ‘Tapping Mode’ in air

Page 39: DNA in chromatin
Page 40: DNA in chromatin

HIERARCHICAL STRUCTURE OF EUCARYOTIC DNA

Page 41: DNA in chromatin

LARGE SCALE REPRESENTATION OF GENOMIC SEQUENCES

Chromosome 22 (Human)

Page 42: DNA in chromatin

Transcription

Replication

Opening of the double helix with a different environment for each strand => asymmetrical process

Page 43: DNA in chromatin

Symmetrical properties of the strands: ‘‘Parity Rule type 2’’

[A] = [T] & [G] = [C] in each strand

Deviations from this property estimated by the compositional skews

S = [C] – [G][C] + [G]CG

S =[A] – [T][A] + [T]AT

Compositional skew due to local biases in a strand in the course of biological mechanisms

Page 44: DNA in chromatin

Strand Compositional Asymmetry

A – TA + T

+C + GC - G

Page 45: DNA in chromatin

A wavelet based methodology to detect gene clusters

Chromosome 22 (Human)

A – TA + T

+C + GC - G

Scal

e(b

p)

A/T

+ C

/G s

kew

Page 46: DNA in chromatin

A wavelet based methodology to detect replication origins

Experimentaly observed replication origin in the human genome

Globin: 4008 kb Chromosome 11

Predicted RO : 4009 kb

A – TA + T

+C + GC - GSkew :

Scal

e (b

p)

A/T

+C/G

ske

w

Page 47: DNA in chromatin

Experimentaly observed replication origin in the human genome

Lamin B2: 2368 kb Chromosome 19

Predicted RO : 2365 kb

A wavelet based methodology to detect replication origins

Scal

e (b

p)

A/T

+C/G

ske

w

A – TA + T

+C + GC - GSkew :

Page 48: DNA in chromatin

Transcription bias

Page 49: DNA in chromatin

Transcription bias

Detecting discontinuities using the wavelet transform

Scal

e

Analysingwavelet

Page 50: DNA in chromatin

Application to a known human replication origin

Scal

e

Analyzingwavelet

C-MYC origin (chromosome 8)

First evidence of a replication bias in human DNA

Page 51: DNA in chromatin

Application to a known human replication origin

Scal

e

Analyzingwavelet

C-MYC origin (chromosome 8)

First evidence of a replication bias in human DNA

Page 52: DNA in chromatin

Application to a known human replication origin

Scal

e

Analyzingwavelet

C-MYC origin (chromosome 8)

First evidence of a replication bias in human DNA

Our model : well defined replication origins, separated by diffuse terminuses

Page 53: DNA in chromatin

Profile detection using an analyzing wavelet adapted to the shape of replicons

Analyzingwavelet

Scal

e

C-MYC origin (chromosome 8)

Page 54: DNA in chromatin

Profile detection using an analyzing wavelet adapted to the shape of replicons

Analyzingwavelet

Scal

e

C-MYC origin (chromosome 8)

Analyzingwavelet

Scal

e

Page 55: DNA in chromatin

Deterministic Chaos in DNA Sequences

Page 56: DNA in chromatin

SHIL’NIKOV HOMOCLINIC CHAOS

Page 57: DNA in chromatin
Page 58: DNA in chromatin

Strand Compositional Asymmetry

A – TA + T

+C + GC - G

Page 59: DNA in chromatin

Phase Portrait Representation of AT+CG skew

Page 60: DNA in chromatin

Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes.

M. TOUCHON, A. ARNEODO, Y. D’AUBENTON-CARAFA & C. THERMES, Nucleic Acids Res. (2004), to appear

Low Frequency rhythms in human DNA sequences : a key to the organization of gene location and orientation?

S. NICOLAY, F. ARGOUL, M. TOUCHON, Y. D’AUBENTON-CARAFA, C. THERMES & A. ARNEODO, Phys. Rev. Lett. (2004), to appear

From scale invariance to deterministic chaos in DNA sequences : towards a deterministic description of gene organization in the human genome

S. NICOLAY, E.B. BRODIE OF BRODIE, M. TOUCHON, Y. D’AUBENTON-CARAFA, C. THERMES & A. ARNEODO, Physica A (2004), to appear

Transcription-coupled TA and GC strand asymmetries in the human genome.M. TOUCHON, S. NICOLAY, A. ARNEODO, Y. D’AUBENTON-CARAFA & C. THERMES,

FEBS Letters 555, 579 (2003)

Long-range correlations between DNA bending sites : relation to the structure and dynamics of nucleosomes.

B. AUDIT, C.VAILLANT, A. ARNEODO, Y. D’AUBENTON-CARAFA & C. THERMES, J. Mol. Biol. 316, 903 (2002)

Long-range correlations in genomic DNA : a signature of the nucleosomalstructure.

B. AUDIT, C. THERMES, C. VAILLANT, Y. D’AUBENTON-CARAFA, J.F. MUZY & A. ARNEODO, Phys. Rev. Lett. 86, 2471 (2001)

Nucleotide composition effects on the long-range correlations in human genes.A. ARNEODO, Y. D’AUBENTON-CARAFA, B. AUDIT, E. BACRY, J.F. MUZY & C. THERMES,

Eur. Phys. J. B1, 259 (1998)

Wavelet based fractal analysis of DNA sequences.A. ARNEODO, Y. D’AUBENTON-CARAFA, E. BACRY, P.V. GRAVES, J.F. MUZY & C. THERMES,

Physica 96 D, 291 (1996)

Characterizing long-range correlations in DNA sequences from wavelet analysis.

A. ARNEODO, E. BACRY, P.V. GRAVES & J.F. MUZY, Phys. Rev. Lett. 74, 3293 (1995)

REFERENCES

Page 61: DNA in chromatin