RNA secondary structure - Rensselaer Polytechnic Institute · When RNA secondary structure matters...

Preview:

Citation preview

RNA secondary structure

• Functions

• Representations

• PredictionsMany slides courtesy of M. Zuker, RPI Math

Sequence Analysis '18 -- Lecture 17

When RNA secondary structure matters

mRNA --> protein

ssRNA

protein

Strong secondary structure can block translation.

RBS

Registry of Standard Biological Partshttp://parts.igem.org/

UUUCU CUNNNNAAAGA GA NN

NNNAUGNNNN

5' 3'NNNN

fMet

ribosome binding site

16s

NNN

especially sensitive is the...

species specific!

Anderson RBS family -- bacterial.

primary-microRNA

microRNA duplex

dicer

microRNA

blocks translation, deadenylation, or

endonuclease digestion

passenger strand degraded

argonaut proteins

transcription, folding

microRNA (miR) Found in 3'UTR introns exons

nucl

eus

cyto

plas

m

Ambiguous bases

Memorize the...

RNA secondary structure is base pairingi•j

Rul

es fo

r nor

mal

SS

Not just Watson-Crick..

Different ways of base-pairing allow RNA to adopt duplex structures beyond A and B

helix.

If any base-pair is possible, how do we predict pairings?

Structure types within RNA sec struct

E

H M

M I B

I

H

I

I

H

M

I

H

HH I

B

2D plot

BHH

H

H

H

H

H

H

H

H

H

E

M

B

I I

I

I

I

I

I

I

M

M

I

note: No bp lines cross.

Converting from 2D plot sec struct to circle/tree.

CCC

CUCUCC

AG G GGUCAU

CGGA

Circle plot=====>

Pseudoknots

• Mutual information (MI) - Requires a deep multiple sequence alignment - Can find non-canonical base-pairs.

Comparative methods, phylogenetics

Free energy calculations

• Dot plot - Easy. - Can be done on a single sequence. - Cannot find non-canonical base pairs.

Prediction of RNA secondary structure

Comparative modelingassume conserved structure between homologs

RNA structure by energy minimization

where, e(i,j) = 0 if j-i < 4

Forward summation of energy matrix E:

Assumes energy is the sum of base pairs.

add to loopadd to loop

start a helix, or add a base pair join helices in

multi-loop

too close, can't pair

Find k such that Ei,j = E i,k + Ek+1,j

Push (i,k) and (k+1,j) onto Stack B.Stop with error if no such k exists.

Stack A becomes the answer: a list of base pairs Stack B is a list of unfinished segments

energy matrix

1. energy matrix is initilized starting from diagonal..

2. base-pairing is found by tracing back from (1,n)

RNA structure by energy minimization

RNA structure by Dot Plot

RNA structure by Dot Plot

1) Run BLAST search to get homologs.2) Prune sequences to remove redundancy.*3) Prune columns to remove uninformative data. (conserved

positions tell you nothing)4) Calculate mutual information (Mi,j) for all pairs of

positions (i,j).

position

position

posi

tion

How-to:RNA structure by Mutual Information

sum over all event pair types

frequency of a pair of events observed together= N(a,b events)/N(total events)

expected frequency of a,b events together is the product of the frquencies of

the events separately

Mutual information, in general

in bits, because we used log2

A measure of the surprisingness of a pair of events.

M = Σ f(a,b) log2( )f(a,b)f(a)f(b)a,b ∈{events}

sum over all base-pair types

frequency of base-pair (B1,B2) at positions (i,j) =N(B1,B2)/N(total sequences)

expected frequency of base B1, B2

Ex11: Mutual information for base-pairsA measure of the surprisingness of the evolution of two positions in the sequence.

i j

pair of sequence positions (i,j)

M(i,j) = Σ fi,j(B1,B2) log2( )fi,j(B1,B2)fi(B1)fj(B2)

B1,B2 ∈{A,C,G,T}

Exercise: Calculate M(i,j)=___________

123456789

101112

species

position

due Nov 2

Can you find pairs of positions with high MI?

W-C non-canonical

Mutual information matrix for 20 aligned sequences

very noisy.

Take home message: You need lots of sequence to do mutual information analysis.

And this is for RNA where the signal is strong. Try protein. You'll need thousands of

sequences....

Same RNA, Mi,j for 302 aligned sequences

cleaner

Helices now appear as straight lines of dots, after re-numbering to remove gaps in

the MSA

Corresponding structure.

H

I

M

I

H

• Mutual information (MI) - Requires a deep multiple sequence alignment - Can find non-canonical base-pairs.

Comparative methods, phylogenetics

Free energy calculations

• Dot plot - Easy. - Can be done on a single sequence. - Cannot find non-canonical base pairs.

Prediction of RNA secondary structure

1. What are the IUPAC codes?2. What are the different types of RNA structure?3. What is mutual information? How is it calculated?4. What are the sources of energy for RNA structure?5. What algorithm is used to calculate the energy over

all RNA structures?6. What is expressed in a dot plot?7. What is expressed in a circle plot?8. What is a pseudoknot?9. What is a non-canonical basepair?10. Can you convert a dotplot into a graph?11. Can you convert a graph into a circle plot?12. Can you see a pseudoknot in a graph, circle, dotplot?

Review questions?

Recommended