Upload
micol
View
19
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT. Approaches to Structural Motif Recognition. Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods. Structural Motif Recognition. - PowerPoint PPT Presentation
Citation preview
Mathematical Challenges in Protein Motif Recognition
Bonnie Berger
MIT
Approaches to Structural Motif Recognition
Alignments
Multiple alignments & HMMs
Threading
Profile methods (1D, 3D)
* Statistical methods
Structural Motif Recognition
1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix).
2) Devise a method to determine if an unknown sequence folds as the motif or not.
3) Verification in lab.
Our Coiled-Coil ProgramsPairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995]
• predicts 2-stranded CCs• http://theory.lcs.mit.edu/paircoil
MultiCoil [Wolf, Kim, Berger, 1997]• predicts 3-stranded CCs• http://theory.lcs.mit.edu/multicoil
LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998]• predicts CCs in histidine kinase linker domains• http://theory.lcs.mit.edu/learncoil
LearnCoil-VMF [Singh, Berger, Kim, 1999]• predicts CCs in viral membrane fusion proteins• http://theory.lcs.mit.edu/learncoil-vmf
Long Distance Correlations
In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence
Biological Importance of Beta Helices
Surface proteins in human infectious disease:• virulence factors (plants, too)• adhesins• toxins• allergens
Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease)
Potential new materials
What is KnownSolved beta-helix structures:
12 structures in PDB in 7 different SCOP families
Related work:
• ID profile of pectate lyase (Heffron et al. ‘98)
• HMM (e.g., HMMER)
• Threading (e.g., 3D-PSSM)
Key DatabasesSolved structures:
Protein Data Bank (PDB) (100’s of non-redundant structures)[www.rcsb.org/pdb/]
Sequence databases:
Genbank (100’s of thousands of protein sequences)[www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html]
SWISSPROT (10’s of thousands of protein sequences)[www.ebi.ac.uk/swissprot]
Performance:
• On PDB: no false positives & no false negatives.
• Recognizes beta helices in PDB across SCOP families in cross-validation.
• Recognizes many new potential beta helices.
• Runs in linear time (~5 min. on SWISS-PROT).
[Bradley, Cowen, Menke, King, Berger: RECOMB 2001]
BetaWrap Program
BetaWrap ProgramHistogram of protein scores for:
• beta helices not in database (12 proteins)• non-beta helices in PDB (1346 proteins
)
Single Rung of a Beta Helix
3D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations
Residues in the T2 turn have special
correlations (Asparagine ladder,
aliphatic stacking)
B3T2
B2
B1
3D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations
Residues in the T2 turn have special
correlations (Asparagine ladder,
aliphatic stacking)
B3T2
B2
B1
Question: but how can we find these correlations which are a variable distance apart in sequence?
[Tailspike, 63 residue turn]
Finding Candidate Wraps
• Assume we have the correct locations of a single T2 turn (fixed B2 & B3).
• Generate the 5 best-scoring candidates for the next rung.
B2
B3 T2Candidate
Rung
Scoring Candidate Wraps (rung-to-rung)
Similar to probabilistic framework plus:
• Pairwise probabilities takenfrom amphipathicbeta (not beta helix)structures in PDB.
• Additional stacking bonuseson internal pairs.
• Incorporates distribution on turn lengths.
Scoring Candidate Wraps (5 rungs)
• Iterate out to 5 rungs generating candidate wraps:
• Score each wrap:
- sum the rung-to-rung scores
- B1 correlations filter
- screen for alpha-helical content
Potential Beta HelicesToxins:• Vaculating cytotoxin from the human gastric pathogen H. pylori• Toxin B from the enterohemorrhagic E. coli strain O157:H7
Allergens:• Antigen AMB A II, major allergen from A. artemisiifolia (ragweed)• Major pollen allergen CRY J II, from C. japonica (Japanese cedar)
Adhesins:• AIDA-I, involved in diffuse adherence of diarrheagenic E. coli
Other cell surface proteins:• Outer membrane protein B from Rickettsia japonica• Putative outer membrane protein F from Chlamydia trachomatis• Toxin-like outer membrane protein from Helicobacter pylori
The ProblemGiven an amino acid residue subsequence, does it fold as a coiled coil? A beta helix?
Very difficult:
• peptide synthesis (1-2 months)
• X-ray crystallization, NMR (>1 year)
• molecular dynamics
Our goal: predict folded structure based on a template of positive examples.
CollaboratorsMath / CS
Mona Singh
Ethan Wolf
Phil Bradley
Lenore Cowen
Matt Menke
David Wilson
Theo Tonchev
Biologists
Peter S. Kim
Jonathan King
Andrea Cochran
James Berger
Mari Milla