Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Preview:

DESCRIPTION

Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT. Approaches to Structural Motif Recognition. Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods. Structural Motif Recognition. - PowerPoint PPT Presentation

Citation preview

Mathematical Challenges in Protein Motif Recognition

Bonnie Berger

MIT

Approaches to Structural Motif Recognition

Alignments

Multiple alignments & HMMs

Threading

Profile methods (1D, 3D)

* Statistical methods

Structural Motif Recognition

1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix).

2) Devise a method to determine if an unknown sequence folds as the motif or not.

3) Verification in lab.

Our Coiled-Coil ProgramsPairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995]

• predicts 2-stranded CCs• http://theory.lcs.mit.edu/paircoil

MultiCoil [Wolf, Kim, Berger, 1997]• predicts 3-stranded CCs• http://theory.lcs.mit.edu/multicoil

LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998]• predicts CCs in histidine kinase linker domains• http://theory.lcs.mit.edu/learncoil

LearnCoil-VMF [Singh, Berger, Kim, 1999]• predicts CCs in viral membrane fusion proteins• http://theory.lcs.mit.edu/learncoil-vmf

Long Distance Correlations

In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence

Biological Importance of Beta Helices

Surface proteins in human infectious disease:• virulence factors (plants, too)• adhesins• toxins• allergens

Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease)

Potential new materials

What is KnownSolved beta-helix structures:

12 structures in PDB in 7 different SCOP families

Related work:

• ID profile of pectate lyase (Heffron et al. ‘98)

• HMM (e.g., HMMER)

• Threading (e.g., 3D-PSSM)

Key DatabasesSolved structures:

Protein Data Bank (PDB) (100’s of non-redundant structures)[www.rcsb.org/pdb/]

Sequence databases:

Genbank (100’s of thousands of protein sequences)[www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html]

SWISSPROT (10’s of thousands of protein sequences)[www.ebi.ac.uk/swissprot]

Performance:

• On PDB: no false positives & no false negatives.

• Recognizes beta helices in PDB across SCOP families in cross-validation.

• Recognizes many new potential beta helices.

• Runs in linear time (~5 min. on SWISS-PROT).

[Bradley, Cowen, Menke, King, Berger: RECOMB 2001]

BetaWrap Program

BetaWrap ProgramHistogram of protein scores for:

• beta helices not in database (12 proteins)• non-beta helices in PDB (1346 proteins

)

Single Rung of a Beta Helix

3D Pairwise Correlations

Stacking residues in adjacent beta-strands

exhibit strong correlations

Residues in the T2 turn have special

correlations (Asparagine ladder,

aliphatic stacking)

B3T2

B2

B1

3D Pairwise Correlations

Stacking residues in adjacent beta-strands

exhibit strong correlations

Residues in the T2 turn have special

correlations (Asparagine ladder,

aliphatic stacking)

B3T2

B2

B1

Question: but how can we find these correlations which are a variable distance apart in sequence?

[Tailspike, 63 residue turn]

Finding Candidate Wraps

• Assume we have the correct locations of a single T2 turn (fixed B2 & B3).

• Generate the 5 best-scoring candidates for the next rung.

B2

B3 T2Candidate

Rung

Scoring Candidate Wraps (rung-to-rung)

Similar to probabilistic framework plus:

• Pairwise probabilities takenfrom amphipathicbeta (not beta helix)structures in PDB.

• Additional stacking bonuseson internal pairs.

• Incorporates distribution on turn lengths.

Scoring Candidate Wraps (5 rungs)

• Iterate out to 5 rungs generating candidate wraps:

• Score each wrap:

- sum the rung-to-rung scores

- B1 correlations filter

- screen for alpha-helical content

Potential Beta HelicesToxins:• Vaculating cytotoxin from the human gastric pathogen H. pylori• Toxin B from the enterohemorrhagic E. coli strain O157:H7

Allergens:• Antigen AMB A II, major allergen from A. artemisiifolia (ragweed)• Major pollen allergen CRY J II, from C. japonica (Japanese cedar)

Adhesins:• AIDA-I, involved in diffuse adherence of diarrheagenic E. coli

Other cell surface proteins:• Outer membrane protein B from Rickettsia japonica• Putative outer membrane protein F from Chlamydia trachomatis• Toxin-like outer membrane protein from Helicobacter pylori

The ProblemGiven an amino acid residue subsequence, does it fold as a coiled coil? A beta helix?

Very difficult:

• peptide synthesis (1-2 months)

• X-ray crystallization, NMR (>1 year)

• molecular dynamics

Our goal: predict folded structure based on a template of positive examples.

CollaboratorsMath / CS

Mona Singh

Ethan Wolf

Phil Bradley

Lenore Cowen

Matt Menke

David Wilson

Theo Tonchev

Biologists

Peter S. Kim

Jonathan King

Andrea Cochran

James Berger

Mari Milla

Recommended