35
A subsequent study of A subsequent study of visualized DNA sequence visualized DNA sequence comparison based on NKS comparison based on NKS Dawei Li Ph.D The Rockefeller University E-mail: [email protected] or dli@rocke feller.edu

A subsequent study of visualized DNA sequence comparison based on NKS

Embed Size (px)

DESCRIPTION

A subsequent study of visualized DNA sequence comparison based on NKS. Dawei Li Ph.D The Rockefeller University E-mail: [email protected] or [email protected]. Basic terms about DNA sequence. 1.DNA is made up of nucleotides. 2.Nucleotides include ‘A’ ‘G’ ‘C’ ‘T’. - PowerPoint PPT Presentation

Citation preview

A subsequent study of A subsequent study of visualized DNA sequence visualized DNA sequence comparison based on NKScomparison based on NKS

Dawei Li Ph.D

The Rockefeller University

E-mail: [email protected] or [email protected]

Basic terms about DNA Basic terms about DNA sequencesequence

1. DNA is made up of nucleotides.

2. Nucleotides include ‘A’ ‘G’ ‘C’ ‘T’.

3. For same nucleotide compositions, different sequences show evidently different stability of DNA helix. For example, -GC- is much more stable than -CG-.

4. There may be some interaction between the nucleotides.

Experimental approaches to Experimental approaches to study interaction between study interaction between

nucleotidesnucleotides Atomic force microscope (AFM) It is a very high-resolution type of scanning probe microscope, with demonstrated resolution of fractions of a nanometer.

X-ray diffraction (XRD)The techniques are based on the elastic scattering of x-rays from structures that have long range order.

Neutron scatteringThe deflection of neutron particles is used as a scientific probe

Nuclear magnetic resonance (NMR) It is a physical phenomenon based upon the quantum mechanical magnetic properties of an atom's nucleus.

Infrared spectroscopy (IR Spectroscopy) It is the subset of spectroscopy that deals with the IR region of the EM spectrum

QuestionsQuestions

Four DNA sequences: 1. CTCGGGTTATCGGCGTGGTCCGGCCGAGGGCGGCATTCCAGAAGAGGGACCCTCACGCCACCA 2. CCAGAGCGTCGCCGACCCTCTAATTGGTCTCCCCAGAAGAGGCTGAGAAGAAGGCCGAAACAG 3. AGAGTCCCAGGACACACTGTAGAAGATCAAAGCAGAAGAAGAGGGAAGAGTGGCTGAGGGAC 4. TCAGCCCATCTGCCATCCCCAAAGATAGAAGACACCCCCTTGGTTGCCCTCTAGAAGATCCAGT

Can you figure out which organisms the sequences belong to?

Is there any “inner-difference” between the sequences? Can you figure out what are the differences and interaction?

The question has not been clarified successfully by the existing approaches based on traditional mathematical rules.

Here ‘Wolfram approach’ can do something….

It may shape the current theory of DNA sequence analysis.

Wolfram approachWolfram approach

1) The Wolfram approach is based on the concept that simple rules are able to produce highly complicated behaviors.

2) It pays attention to the interaction power of regulation between any adjacent nucleotides.

3) It can show alterations of DNA nucleotides dynamically including transposition, insertion, deletion, and duplication.

Wolfram approach which was described in “A New Kind of Science” in 2001 has attracted biologists’ attention.

Wolfram approach provides a nontraditional visualized model for DNA sequence comparison. Compared with traditional approaches:

It has become possible to study some biological issues that have never been successfully clarified by traditional mathematical methods. However, study of DNA comparison with Wolfram approach is still very few.

About our studyAbout our study

With Wolfram approach, we 1. studied some simple rules;2. analyzed some DNA sequences of different viruses with a special rule;3. studied the images visually.

HypothesesHypotheses

Our studies were based on four hypotheses: 1) DNA sequence is not random, it has a rule.

2) There is an uncertain mode of nucleotide organization that each DNA sequence follows.

3) Simple rule can also produce complex behaviors in living organisms.

4) Wolfram approach can reflect the rule rooted in DNA sequence.

Rule in the hypothesesRule in the hypotheses

The eight arrangements can produce very complicated behaviors.

Nested structure

The nested structure defined by wolfram was generated based on one single paternal cell.

A very simple example: snowflakeA very simple example: snowflake

The sequences of SARS The sequences of SARS virusesviruses

SARS BJ01, partial genome; SARS BJ02, partial genome; SARS BJ03, partial genome; SARS BJ04, partial genome; SARS CUHK-W1, complete genome; SARS GZ01, partial genome; SARS HKU-39849, complete genome; SARS TOR2, complete genome; SARS Urbani, complete genome; SARS coronavirus CUHK-Su10, complete genome; SARS coronavirus isolate SIN2774 complete genome; SARS coronavirus TW1, complete genome; SARS coronavirus, complete genome.

MachineMachine

ComputerSGI Origin 3000 (Silicon Graphics, Inc. 64 500 MHZ IP35 processors) was used throughout our study. Each sequence was run using the same programs.

ImagesMore than 3,000 images (200,000 Mega) were generated.

The results of 13 SARS virusesThe images were arranged according to the color order in chromatogram (Ref.1).

The SARS viruses behaved quite differently from other viruses. There was a very large nested structure across the beginning 10 kb region.

By comparison, we found the nested structures mainly located in the regions of replicases 1A and 1B. The replicase 1A protein gene may control the activities of the replication complex of SARS viruses.

Comparisons between SARS-CUHK and SARS-Comparisons between SARS-CUHK and SARS-GZGZ

Possible origin of SARS virus (Ref.1).

Comparison of images among five different viruses(a) and (b), SARS virus and equine rhinovirus (ER), respectively, showing the nested structures. (c) Another virus in which only two small nested structures were found. (d) A typical behavior of a common virus. (e) The behavior of HIV. Note: Images in (a), (c), (d), and in (b), (e) are reduced by 15,000 and 1,600 folds, respectively (Ref.1)..

The whole genomes of equine rhinovirus (ER) and SARS virus shared similar nested structures.

SARS virus and human coronavirus 229E were very different in behavior.

No nested structure was found in HIV.

To study whether the nested structures exist in other organisms, we analyzed other ten virus genomes with Wolfram approach.

The ten types of viruses are as follow: Avian infectious bronchitis virus (and avian infectious bronchitis virus messenger ribonucleic acid (mRNA)), Bovine coronavirus, Dengue virus type 4 strain 814669, Human rhinovirus 1B, Japanese encephalitis virus strain K94P05, Murine hepatitis virus, Pestivirus type 2, Porcine epidemic diarrhea virus, Porcine transmissible gastroenteritis virus minigenome, West Nile virus.

The results showed that all the viruses can be classified into two groups by their behaviors: Group 1 with left bottom growth of white lines; Group 2 with right bottom growth of black lines. No nested structure was found.

We also studied the behavior of mRNA sequence of avian infectious bronchitis virus.

The current results may suggest that: •1) the region of the nested structure may be involved in the reproduction of the virus;•2) the coding sequence of the virus may share some kind of similar complex gene regulation cycle with SARS viruses. The nested structure may contain some special bio-information.

Results in summaryResults in summary

1. SARS viruses showed the nested structure behaviors. The results suggested that the genome sequences should have specific mode of nucleotide organization.

2. HIV showed another type of mode of nucleotide organization.

3. The unique characteristics found in the DNA sequence of SARS viruses and the mRNA sequence of avian virus suggested the importance of the nested structure behaviors.

DiscussionDiscussion

AdvantagesWolfram approach has some advantages:

1. It can magnify the tiny changes in whole genome sequence for both overall and detailed analyses.

2. It can also be used in a single nucleotide scale, such as DNA mutation and polymorphisms (SNPs and microsatellites).

3. It pays more attention to the interaction network of power and regulation among the adjacent nucleotides.

4. It is not only appropriate to DNA/RNA sequences, but also to protein sequences.

Disadvantages: There are some disadvantages in our study, such as

only one of the 256 rules was adapted. As for encoding the nucleotides, quaternary system

should be better than binary system, however, quaternary system will result in more rules. (48)

A typical feature of Wolfram approach:Each cell of the DNA sequence has interaction with its adjacent cells.

Scores of power: The power from two sides is not always equal, it is scored between 1 and 0, which represent ‘for’ and ‘against’, respectively.

Four behaviorsThe behaviors of the DNA sequences can be classified into four categories:

1 purely repetition with left growth as most common viruses showed;

2 purely repetition with right growth as HIV showed;

3 nested structure as SARS viruses showed;

4 simply identical white or black.

Nested Structure 1. The nested structure may result from the

interaction of aggregation between black and white cells.

2. It may represent a regulation cycle. A black line may signify the beginning of a protein production cycle, and a white line

may signify the end by closing off the triangle.

Interaction network of power 1. The interaction between nucleotides

includes power and anti-power. Each nucleotide receives power from adjacent nucleotides and exerts power as well.

2. The balance can be easily broken by sequence alteration because of its sensitivity. A single mutation can cause death, this may be because the original nucleotide has a key role in the whole genome. It can also explain why some mutations can be ignored.

Mutation &Mutation & SNP SNP

Mutation is change to the nucleotide of DNA or RNA sequence.

Single Nucleotide Polymorphism (SNP) is a DNA sequence variation occurring when a single nucleotide in DNA sequence or the genome differs between members of a species.

An example for SNP modelAn example for SNP model

In conclusionIn conclusion The traditional intuition is that the behavior should be

simple if the rule is simple. This is not true based on the data demonstrated by both Wolfram’s work and our study. The simple rule can actually capture the essential mechanisms responsible for complex phenomena in living organisms.

We applied Wolfram approach in the DNA sequence analysis. Our results supported that the approach is appropriate for visualized sequence comparison, and the approach is a useful categorizer tool.

The results may be fundamental but interesting for the subsequent studies. Further systematic investigations are necessary and the results also need experimental work to be confirmed.

Reference Ref.1: Li. D et al. Understanding SARS with Wolfram approach. Acta Biochimica et

Biophysica Sinica. 2004; 36(1):1-10.

Acknowledgement Lin He, Zhende Huang, Jurg Ott et al.

Thank you !Thank you !