View
214
Download
0
Category
Preview:
Citation preview
1BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
BCB 444/544
Lecture 24
Protein Tertiary Structure Prediction
#24_Oct17
2BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Mon Oct 15 - Lecture 23
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8
(Terribilini)
RNA Structure/Function & RNA Structure Prediction
• Chp 16 - pp 231 - 242
Fri Oct 18 - Lecture 25
Gene Prediction • Chp 8 - pp 97 - 112
Required Reading (before lecture)
3BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
New Reading & Homework Assignment
ALL: HomeWork #4 (emailed & posted online Sat AM)
Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read:
Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website)
• Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures.
• Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat
Oct 13
4BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html
• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and
antibody libraries in protein engineering and ligand selection
• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI• Lyric Bartholomay (Ent, ISU) TBA
5BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Chp 15 - Tertiary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
• Methods• Homology Modeling• Threading and Fold Recognition• Ab Initio Protein Structural Prediction• CASP
6BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Tertiary Structure Prediction Methods
2 (or 3) Major Methods:1. Comparative Modeling:
• Homology Modeling (easiest!) • Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)
7BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
1. Align target sequence with template structures
in fold library (usually from the PDB)
2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure
3. Rank models based on energy scores
Target Sequence
Structure Templates
ALKKGF…HFDTSE
Steps in Threading
8BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
A Local Example: Rapid Threading Approach for Protein Structure Prediction
Kai-Ming Ho, Physics Haibo Cao
Yungok Ihm Zhong Gao
James MorrisCai-zhuang
Wang Drena Dobbs, GDCB
Jae-Hyung LeeMichael
TerribiliniJeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
Polymer 45:687-697
9BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Simplify: Template structure representation
,1ijC 5.6ijr Åif (contact)
,0ijC Otherwise
A neighbor in sequence (non-contact)
i
j
1
N
Template structure ( contact matrix) C NN
Yungok Ihm
10BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Simplify: Energy Function
• Interaction “counts” only if two hydrophobic amino acid residues are in contact
• At residue level, pair-wise hydrophobic interaction is dominant:
E = i,j Cij Uij
Cij : contact matrix
Uij = U(residue I, residue J)
MJ: U = Uij
LTW: U = Qi*Qj
HP: U = {1,0}
Yungok Ihm
Energy calculation: Contact energy
Miyazawa-Jernigan (MJ) matrix:
210 parametersStatistical potential
Li-Tang-Wingreen (LTW):
20 parameters
})){(2~
( jiij qqCM
Contact Energy: )(1
ijjijic CQCQEN
ij
2604.0,6797.0
ii qQ
with
C M F I L
CMFILVW
046 054 -020 049 -001 006057 001 003 -008052 018 010 -001 -004
M
iq
Qi~ solubility
~ hydrophobicity
contact matrix C
Yungok Ihm
ij
1
N
Template Structure
N
ij
jijic QCQE1
Contact Energy
Contact Matrix
Sequence
AVFMRIHNDIVYNDIANTTQ
Sequence Vector
)6497.0 ,1197.1 ,9897.0 ,7997.0(
),.....,,,(
EFVA QQQQS
otherwise(a neighbor in sequence)
,0
56 if ,1
ij
ijij
C
rC Å
Scoring Function
Summary of Ho Threading Procedure
Yungok Ihm
Can complexity be further reduced?Consider simplifying structure representation, too
ALKKGF…HFDTSE
Sequence – Structure (1D – 3D problem)
(1D – 2D problem)
(1D – 1D problem)
Sequence – Contact Matrix
Sequence – 1D Profile
Haibo Cao
Represent contact matrix by its dominanteigenvector (1D profile)
• First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure
• Higher ranking (rank > 4) eigenvectors are “sequence blind”
Haibo Cao
15BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Threading Alignment StepThreading Alignment Step - - now fast! now fast! Align Align target sequence vector (1D)target sequence vector (1D) with with eigenvector profile of eigenvector profile of template structure template structure (1D)(1D)
1VP 1D Profile
Maximize the overlap between the
Sequence (S) and the profile (P) allowing gapsPS
Calculate contact energy
using the alignment: Ec
New profile CPP
Cao et al Polymer 45 (2004)
16BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Parameters for alignment?
• Gap penalty: Insertion/deletion in helices or
strands is strongly penalized; smaller penalties for in/dels in loops
Gap penalties apply to alignment score only, not to energy calculation
• Size penalty: If a target residue and aligned
template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized
Size penalties apply to alignment score only, not to energy calculation
Loop
Helix
ALKKGFG…HFDTSE
Yungok Ihm
17BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
How incorporate secondary structure?
• Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V)
N+ = total number of matches between predicted & actual secondary structure of template
N- = total number of mismatches
Ns = total number of residues selected in alignment
“Global fitness” : f = 1 + (N+ - N-) / Ns
Emod = f * Ethreading
Yungok Ihm
How much better is this “fit” than random?
Eshuffle : Shuffled Sequence vs Structure
Erelative = Emod – Eshuffled
Yungok Ihm
Avg E score for same sequence shuffled (randomized) many times
E score modifed to reflect fit with predicted 2' structure
19BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Performance Evaluation? "Blind Test"
CASP5 Competition (CASP7 is most recent)
(Critical Assessment of Protein Structure Prediction)
Given: Amino acid sequence
Goal: Predict 3-D structure (before experimental results published)
20BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Typical Results: (well, actually, our BEST Results):
HO = #1-Ranked CASP5 Prediction for this Target
• Target 174
• PDB ID = 1MG7
Actual Structure
Predicted Structure
T174_1
T174_2
Cao, Ihm, Wang, Dobbs, Ho
21BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
• FR Fold Recognition • (targets manually assessed by Nick Grishin)
• -----------------------------------------------------------
• Rank Z-Score Ngood Npred NgNW NpNW Group-name • 1 24.26 9.00 12.00 9 12 Ginalski • 2 21.64 7.00 12.00 7 12 Skolnick Kolinski • 3 19.55 8.00 12.50 9 14 Baker • 4 16.88 6.00 10.00 6 10 BIOINFO.PL • 5 15.25 7.00 7.00 7 7 Shortle • 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA • 7 13.49 4.00 11.00 4 11 Brooks • 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming • 9 10.45 3.00 5.50 3 6 Jones-NewFold • -----------------------------------------------------------
• FR NgNW - number of good predictions without weighting for multiple models• FR NpNW - number of total predictions without weighting for multiple models
Overall Performance in CASP5 Contest
~8th out of 180 (M. Levitt, Stanford)
22BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
CASP - Check it out!
Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest - 2006:• http://www.predictioncenter.org/casp7/Casp7.html
• Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP,
EVA) & URLs for them
• Related contests & resources:
• Protein Function Prediction (part of CASP)
• CAPRI = Critical Assessment of Predicted Interactions
• New: CASPM = CASP for M = Mutant proteins
• Predict effects of small (point) mutations, e.g., SNPs
23BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Another Convenient List of Links for Protein Prediction Servers
http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software
24BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Chp 13 - Protein Structure Visualization, Comparison & Classification
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison & Classification
• Protein Structural Visualization
Protein Structure Comparison• Protein Structure Classification
25BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures (see Xiong textbook for details)
1. Intermolecular 2. Intramolecular 3. Combined
But, very active research area - many recent new methods
3 Popular Methods: 1. DALI = Distance Matrix Alignment of Structures
(Holm)• FSSP Database
2. SSAP = Sequential Structure Alignment Program (Orengo)1. CATH Database
• CE = Combinatorial Extension (Bourne)• VAST at NCBI
URLS:
http://en.wikipedia.org/wiki/Structural_alignment_software
49BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Chp 16 - RNA Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 16 RNA Structure Prediction (Terribilini)
• RNA Function• Types of RNA Structures• RNA Secondary Structure Prediction Methods• Ab Initio Approach• Comparative Approach• Performance Evaluation
50BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNA Function
• Storage/transfer of genetic information• Newly discovered regulatory functions - RNAi
pathways especially• Catalytic
51BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNA types & functions
Types of RNAs Primary Function(s)
mRNA - messenger translation (protein synthesis) regulatory
rRNA - ribosomal translation (protein synthesis) <catalytic>
t-RNA - transfer translation (protein synthesis)
hnRNA - heterogeneous nuclear
precursors & intermediates of mature mRNAs & other RNAs
scRNA - small cytoplasmic signal recognition particle (SRP)tRNA processing <catalytic>
snRNA - small nuclear snoRNA - small nucleolar
mRNA processing, poly A addition <catalytic>rRNA processing/maturation/methylation
regulatory RNAs (siRNA, miRNA, etc.)
regulation of transcription and translation, other??
52BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNA Structure
• RNA forms complex 3D structures• Mainly single stranded• The single RNA strand can self-hybridize to
form base paired regions
53BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Levels of RNA Structure
• Like proteins, RNA has primary, secondary, and tertiary structures
• Primary structure - base sequence• Secondary structure - single stranded or base paired• Tertiary structure - 3D structure
Rob KnightUniv Colorado
54BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNA Structure Prediction
• RNA tertiary structure is very difficult to predict• Focus on predicting RNA secondary structure• Given a RNA sequence, predict the secondary
structure of the molecule• Almost all methods ignore higher order
secondary structures like psuedoknots
55BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Base Pairing in RNA
G-C, A-U, G-U ("wobble") & variants
http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs
See: IMB Image Library of Biological Molecules
56BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Common structural motifs in RNA
• Helices
• Loops• Hairpin • Interior • Bulge • Multibranch
• Pseudoknots
Fig 6.2Baxevanis & Ouellette 2005
57BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNA Secondary Structure Prediction Methods
• Two main types of methods• Ab initio - based on calculating the most
energetically favorable secondary structure• Comparative approach - based on evolutionary
comparison of multiple related RNA sequences
58BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Ab Initio Prediction
• Only requires a single RNA sequence• Calculates minimum free energy structure• Base pairing lowers free energy of the
structure, so methods attempt to find secondary structure with maximal base pairing
59BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Ab Initio Prediction
• Free energy is calculated based on parameters determined in the wet lab
• Known energy associated with each type of base pair
• Base pair formation is not independent - multiple base pairs adjacent to each other are more favorable than individual base pairs - cooperative
• Bulges and loops adjacent to base pairs have a free energy penalty
60BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Ab Initio Energy Calculation Method
• Search for all possible base-pairing patterns
• Calculate the total energy of the structure based on all stabilizing and destabilizing forces
Fig 6.3Baxevanis & Ouellette 2005
61BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Dot Matrices
• Can be used to find all possible base pair patterns
• Compare the input sequence to itself and put a dot anywhere there is a complimentary base
R Knight 2005
62BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Dynamic Programming
• Finding the best possible secondary structure is difficult - lots of possibilities
• Compare RNA sequence with itself• Apply scoring scheme based on energy
parameters for base pairs, cooperativity, and penalties for destabilizing forces
• Find path that represents the most energetically favorable secondary structure
63BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Problem
• DP returns the SINGLE best structure• There may be many structures with similar
energies• Also, your predicted secondary structure is only
as good as the energy parameters used• Solution - return multiple structures with near
optimal energies
64BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Popular Ab Initio Prediction Programs
• Mfold• Combines DP with thermodynamic calculations• Fairly accurate for short sequences, less accurate as
sequence length increases
• RNAfold• Returns multiple structures near the optimal structure• Computes a larger number of potential secondary
structures than Mfold, so it uses a simplified energy function
65BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Comparative Approach
• Uses multiple sequence alignment• Assumes related sequences fold into the same
secondary structure
66BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Covariation
• RNA functional motifs are conserved• To maintain RNA structure during evolution, a
mutation in a base paired residue must be compensated for by a mutation in the base that it pairs with
• Comparative methods search for covariation patterns in MSA
67BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Consensus Structures
• Predict secondary structure of each individual sequence
• Compare all structures and see if there is a most common structure
68BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Popular Comparative Prediction Programs
• Two types• Require user to provide MSA• No MSA required
69BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
RNAalifold
• Requires user to provide the MSA• Creates a scoring matrix combining minimum
free energy and covariation information• DP is used to select the minimum free energy
structure
70BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Foldalign
• User provides a pair of unaligned RNA sequences
• Foldalign constructs alignment then computes a commonly conserved structure
• Suitable only for short sequences
71BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Dynalign
• User provides two input sequences• Dynalign calculates possible secondary
structures using algorithm similar to Mfold• Dynalign compares multiple structures from
both sequences to find a common structure
72BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07
Performance Evaluation
• Ab initio methods achieve correlation coefficient of 20-60%
• Comparative approaches achieve correlation coefficient of 20-80%
• Programs that require user to supply MSA are more accurate
• Comparative programs are consistently more accurate than ab initio programs
• Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures!
- Gutell, Pace
Recommended