Automating Steps in Protein Structure Determination by NMR CS 296.4 April 13, 2009

Preview:

Citation preview

Automating Steps in Protein Structure Determination

by NMR

CS 296.4 April 13, 2009

Outline

Background

Steps in NMR protein structure determinationThe ACE cycle (Assign-Calculate-Evaluate)The assignment problem

Algorithms for automated NOE assignment

Semi-automated methods More-automated methods

Conclusions

The Steps inProtein Structure Determination by NMR

1. Sample preparation2. Data collection3. Data evaluation4. Structure calculation5. Structure refinement6. Structure deposition

The Steps inProtein Structure Determination by NMR

1. Sample preparation(a) protein selection(b) gene engineering(c) protein expression(d) protein purification(e) buffer optimization(f ) isotope labeling

2. Data collection3. Data evaluation4. Structure calculation5. Structure refinement6. Structure deposition (and maybe write a paper and graduate)

The Steps inProtein Structure Determination by NMR

1. Sample preparation(a) protein selection(b) gene engineering(c) protein expression(d) protein purification(e) buffer optimization(f ) isotope labeling

2. Data collection(a) HSQC (b) amide H/D exchange(c) triple-resonance

3. Data evaluation4. Structure calculation5. Structure refinement

The Steps inProtein Structure Determination by NMR

1. Sample preparation(a) protein selection(b) gene engineering(c) protein expression(d) protein purification(e) buffer optimization(f ) isotope labeling

2. Data collection(a) HSQC (b) amide H/D exchange(c) triple-resonance

3. Data evaluation(a) spectrum calculation(b) peak picking

Automatable Steps inProtein Structure Determination by NMR

1. Sample preparation2. Data collection3. Data evaluation4. Structure calculation5. Structure refinement6. Structure deposition

Fig. 2 (2003) Progress in NMR Spectroscopy, 43, 105, Guntert.

The

AssignCalculateEvaluate

cycle

in automated

NOEassignment

and structure

calculation.

Automating NOE Assignmentsand

THE Assignment Problem

Automating NOE Assignmentsand

THE Assignment Problem

There are MANY assignment tasks

1. Resonance Assignment 2. NOE Assignment

Automating NOE Assignmentsand

THE Assignment Problem

There are MANY assignment tasks

1. Resonance Assignment (interpreting data)2. NOE Assignment (interpreting data)

Automating NOE Assignmentsand

THE Assignment Problem

There are MANY assignment tasks

1. Resonance Assignment 2. NOE Assignment

and one major assignment problem.

ambiguous assignments

Due to the data collection problems of1. Completeness 2. Uniqueness

Automating NOE Assignmentsand

THE Assignment Problem

There are MANY assignment tasks

1. Resonance Assignment 2. NOE Assignment

and one major assignment problem.

ambiguous assignments

Due to the data collection problems of1. Completeness (missing data points)2. Uniqueness (unresolvable data points)

from Fig. 3 (2003) Progress in NMR Spectroscopy, 43, 105, Guntert.

Unambiguously assigning a NOESY cross peak

Automated NMR Protein structure calculationPeter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125

Algorithms for automated NOESY assignment

Semi-automated methods1. ASsign NOEs (1993)2. Structure Assisted NOE Evaluation (2001)

Automated NMR Protein structure calculationPeter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125

Algorithms for automated NOESY assignment

Semi-automated methods1. ASsign NOEs (1993)2. Structure Assisted NOE Evaluation (2001)More-automated methods1. NOAH (1995)2. Ambiguous Restraints Iterative Assignments (1997)3. AutoStructure (1999)4. KNOWledge-based NOE assignments (2002)5. CANDID (2002)

ASNO (1993) Guntert, Berndt, & Wuthrich

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. Set of estimated structures User specifies1. = max allowed chemical shift error 2. dmax = max interproton distance causing NOE3. nmin = min # structures with d < dmax

ASNO (1993) Guntert, Berndt, & Wuthrich

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. Set of estimated structures User specifies1. = max allowed chemical shift error 2. dmax = max interproton distance causing NOE3. nmin = min # structures with d < dmax Algorithm steps1. each cross peak: find all poss. assignments (1Hj, 1Hk) 2. each (1Hj, 1Hk): n = # of structures with d < dmax 3. Prune all (1Hj, 1Hk) with n < nmin User intervention1. Manually check and refine NOE assignments (1Hj, 1Hk) 2. Refine set of structures and rerun algorithm

Fig. 1 (1993) J Biomol NMR, 3, 601, Guntert, Berndt, & Wuthrich. demo: Dendrotoxin K, 7kDa, 57AA, bbRMSD = 0.32Ang

SANE (2001) Duggan, Legge, Dyson, & Wright

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j ) ) User specifies Filters 1. Distance (Set of estimated structures)2. Chemical Shift ( = max allowed error)3. Secondary structure (unlikely NOE assignments)4. Assignment (expected NOE assignments)5. NOE contribution (same as in ARIA method)

SANE (2001) Duggan, Legge, Dyson, & Wright

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) User specifies Filters 1. Distance (Set of estimated structures)2. Chemical Shift ( = max allowed error)3. Secondary structure (unlikely NOE assignments)4. Assignment (expected NOE assignments)5. NOE contribution (same as in ARIA method)Algorithm steps1. each cross peak: find all poss. assignments (1Hj, 1Hk) 2. Apply five filters to prune list of (1Hj, 1Hk) 3. Write unique or ambiguous dist restraints, or violationsUser intervention1. Violation analysis

Fig. 1 (2001) J Biomol NMR, 19, 321, Duggan, et al. demo: LFA-1 I-domain, 21.3kDa, 183AA, bbRMSD = 0.29Ang

NOAH (1995) Mumenthaler & Braun

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j ) ) 4. Scalar coupling constants (3JNH)Algorithm calculates1. Distance constraints from NOE assignments2. Angle constraints from scalar couplings

NOAH (1995) Mumenthaler & Braun

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. Scalar coupling constants (3JNH)Algorithm calculates1. Distance constraints from NOE assignments2. Angle constraints from scalar couplingsAlgorithm uses1. Structure-based filter (recognizes correct constraints)2. Chemical Shift limit ( = max allowed error)3. Error-tolerant target function in DIAMOD (1994) (minimizes effect of incorrect distance constraints from incorrect NOE assignments)

Fig. 1 (1995) J Mol Biol, 254, 465, Mumenthaler & Braun demo: 3 proteins ranging from 57 to 74 residues

(1995) J Mol Biol, 254, 465, Mumenthaler & Braun NMRa/b=DEN=57, TEN=74, REP=69 residues

ARIA (1997) Nilges, et al.

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. Assignment cutoff, p, decreases for each cycle5. (opt) preliminary structures, manual assignments6. (opt) RDCs, scalar couplings, d-angles, S-S or H-bondsAlgorithm calculates in each cycle1. Unique and partial NOE assignments 2. Unique and ambiguous distance restraints 3. Merges distance restraints with other input data4. Bundle of refined structures (typically 20)

ARIA (1997) Nilges, et al.

An NOE cross peak with more than one possible assignment is considered as a weighted composite of all of them. Ambiguous distance restraints introduced to incorporate dk of each ambiguous NOE assignment.

Ambiguous restraints

To reduce the number of assignment possibilities each relative contribution Ck is calculated from dk and the average distancefor all possible assignments from the lowest n of 20 conformersfrom the previous cycle. The largest Ck that add up to the cutoffvalue, p, for that cycle are kept, the rest are discarded.

Fig. 1 (1997) J Mol Biol, 269, 408, Nilges, et al. demo: -spectrin PH domain, 106 residues

Table 1 (1997) J Mol Biol, 269, 408, Nilges, et al.

-spectrin PH domain, 106 residues

MAN data derived from manual assignments80ms and 30ms data differ only in mixing times

AutoStructure (1999) Moseley & Montelione

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. Scalar couplings5. Slow amide H/D exchange data6. Preliminary structure7. Preliminary H-bonded pairsAlgorithm calculates1. Distance restraints2. Dihedral angle restraints 3. H-bonding pairs 4. Refined structures

Fig. 1 (1999) Curr. Opin. Struct. Biol., 9, 635, Moseley & Montelione. (& Y.J. Huang PhD thesis)

basic fibroblast growth factor (127 residues)

(a) 10 NMR-derived structures bbRMSD = 0.7 Ang. between (b) manual and AutoStructure-derived structures

KNOWNOE (2002) Gronwald, et al.

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. NOESY cross peak volume probability distribution5. Preliminary structureUser specifies1. = max allowed chemical shift error 2. initial value of dmax = max interproton distance 3. Number, N, of current best structures

KNOWNOE (2002) Gronwald, et al.

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments3. NOESY cross peak list (of pairs ( j j ) ) 4. NOESY cross peak volume probability distribution5. Preliminary structureUser specifies1. = max allowed chemical shift error 2. initial value of dmax = max interproton distance 3. Number, N, of current best structuresAlgorithm, working together with CNS, iteratively will1. build A-list of uniquely assigned NOE cross peaks2. calculate P(Ak, a | Vo) for all other peaks3. add to A-list all peaks with P(Ak, a | Vo) < cutoff (0.8-0.9)4. use current A-list to calculate N structures

KNOWNOE (2002) Gronwald, et al.

The problem of ambiguous assignments is addressedwith a Bayesian algorithm based on NOE cross peak volume probability distributions derived from 326 spectra.

P(Ak, a | Vo) = probability that more than fraction a of cross peak volume Vo is due to assignment k

If P(Ak, a | Vo) > cutoff value (typically 0.8 to 0.9) then consider that peak assigned to k for the next cycle.

These authors state that their algorithm is “Based on the observation that cross peak volume and correct cross peak assignment are not independent ofeach other”.

Figures 3 & 4 (2002) J. Biomol. NMR, 23, 271, Gronwald, et al. Probability distributions of distance (left) and volume (right)

CANDID (2002) Hermann, Guntert & Wuthrich

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j ) ) 4. Previously assigned NOE distance constraints5. (opt) other conformational constraints User specifies1. = max allowed chemical shift error2. Cycle-dependent parameters (thresholds, cutoffs, etc.)

from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.

CANDID (2002) Hermann, Guntert & Wuthrich

Input “data”1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j ) ) 4. Previously assigned NOE distance constraints5. (opt) other conformational constraints User specifies1. = max allowed chemical shift error2. Cycle-dependent parameters (thresholds, cutoffs, etc.)Algorithm uses1. Structure-based filters (like NOAH)2. Ambiguous distance constraints (like ARIA)3. Network anchoring (new) 4. Constraint combination (new)

Fig. 1 (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.

CANDID (2002) Hermann, Guntert & Wuthrich

ways to handle problems caused by no preliminary structure in first cycle

1. Network anchoring “… evaluates the self-consistency of NOE assignments independent of knowledge of the 3D protein structure.”

“… a sensitive approach for detecting erroneous ‘lonely’ constraints …”

2. Constraint combination “… an extension of the concept of ambiguous NOE assignments.”

“… reduces the impact of unidentified artifact constraints in the input for the first structure calculation.”

Result:“The correct fold is obtained in cycle 1 of a de novo structure calculation.”

from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.

Questions ?

Conclusions

Recommended