15
Operations on RNA Strings http:// www.rnaparse.com http:// www.rnaparse.com

Operations on RNA Strings

  • Upload
    zarifa

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

http:// www.rnaparse.com. Operations on RNA Strings. RNA is the workhorse that uses this information to do everything else. * DNA is a linear string of four nucleotides ATGC that stores information. Shown above is the process of the transcription of DNA into single strands of RNA. - PowerPoint PPT Presentation

Citation preview

Page 1: Operations on RNA Strings

Operations on RNA Strings

http:// www.rnaparse.comhttp:// www.rnaparse.com

Page 2: Operations on RNA Strings

* DNA is a linear string of four nucleotides ATGC that stores information.

RNA is the workhorse that uses this information to do everything else ...

Page 3: Operations on RNA Strings

Roughly, RNA can be thought of a single stranded analog of DNA where ATCG gets turned into the bases AUCG.

Shown above is the process of the transcription of DNA into single strands of RNA

Page 4: Operations on RNA Strings

RNA is a single, linear strand of nucleotides connected by a backbone of sugar-phosphate. As shown below, the strand is read

as GCAGACCAAUGCAUGCGAUGAAGUUGAUCAUA...

Page 5: Operations on RNA Strings

Interestingly, we can see the string “GCAGACCAAUGCAUGCGAUGAAGUUGAUCAUA...” looks as if it has no structure until we see it represented as the stem and loop structure shown

below. This is essentially due to only two rules of bonding between individual nucleotides.

As you can imagine, this allows for an infinite number of convoluted RNA shapes to form depending on the sequence of nucleotides.

Page 6: Operations on RNA Strings

For purposes of discussion, RNA bonding takes place between certain nucleotides as follows (exceptions do exist.)

A with UG with C

How can such a simple scheme produce EVERYTHING that living things are?

Page 7: Operations on RNA Strings

Axiom #1

When talking about the function of RNA, shape rather than sequence means everything.

Consider the hypothetical strings AUGCGGCAU and GACGCCGUC forming the hypothetical loop and stem shapes shown below.

Page 8: Operations on RNA Strings

The sequence of the two shapes below are read AUGCGGCAU and GACGCCGUC

But you will notice the shape or secondary structure does not differ between the two. Following the A:U, G:C rules of bonding we have the same shape with

different sequences.

In fact, we could think up a few thousand ways to create the same simple shape shown above using the same rule set and by choosing nucleotides that fit the rules:

AAAACUUUU works...CUCUGAGAG works...

?=

Page 9: Operations on RNA Strings

Before continuing I'd like to stress a few key points:

RNA is a linear string that has the ability to fold back on itself to form complex shapes thathave bio-functional meaning.

RNA is composed of four nucleotides A U G C

RNA rules of bonding are A with U, G with C (sometimes U with G)

When talking about RNA structures secondary shape is often more important to function than the composition of linear sequence.

Page 10: Operations on RNA Strings

Possible operations on a string of the 4 elements {AUGC} where the individual properties are such that (A binds to U) and (G binds to C) and their relationship

to formal language * see Noam Chomsky

Page 11: Operations on RNA Strings

Searches for these RNA's as they exist in nature don't get extraordinarily complex until the string becomes folded back on

itself in certain ways.

Linear, Type 3 language is easy – we just use simple regular expressions (e.g. find AUG[A or C] CCGAA)

Loops are a bit harder but are still context-free language

Pseudoknots are extremely hard to find and are of context-sensitive language. Pseudoknots are essentially an rna loop that have formed a stem-structure outside of itself.

Page 12: Operations on RNA Strings

The RNA Pseudoknot

A string such as RNA can fold back on itself to form quite complex (non-knoted) structures that are exceedingly difficult to describe with programming languages.

Show here is an RNA pseudoknot having the sequence: -AAAAGGGGUUUUCCCC-(note crossing dependencies between the two stems)

AAAAGGGGUUUUCCCC(((([[[[))))]]]]

*Descriptive nomenclature*

Page 13: Operations on RNA Strings

www.RNAParse.com

We parse RNA strings for structure rather than searching for specific nucleotide strings by treating RNA as a language where the predicates are based on properties of individual nucleotides:

{A} concatenates to {U}

{U} concatenates to {A}

{G} concatenates to {U}

{U} concatenates to {G}

These predicates accept a stem of n-length and loops of n-lenght are added as needed. Then predicates are arranged in the same order as the structure we wish to match (locate.)

Predicate A (Left part of stem)Loop Predicate (simple regular expression) Predicate B (Left part of stem)Loop Predicate (simple regular expression) Predicate A' (Right part of stem)Loop Predicate (simple regular expression) Predicate B' (Right part of stem)

Or however we wish to arrange them to match some given structure ...

Page 14: Operations on RNA Strings

www.RNAParse.com

The method is elegantly simple and is computationally linear.

Results are parsed and fed into a database for further study. An important caveat in computational biology being that computer-located structure does not always mean a given structure exists in nature.

The RNAparse database (screen shot below) is free to use and content is added as its discovered. Users are free to log on as GUEST.

http://www.rnaparse.com/database/Main_Database_list.php

http://www.rnaparse.com/database/Sequence_to_Structure_Report_report.php

http://www.rnaparse.com/mirror/Main_DB_Compliment_Repeats_list.php

Page 15: Operations on RNA Strings

Many thanks to those who have contributed to make this research possible.

Primary contact James F. Lynn [email protected]