Upload
desiree-folley
View
214
Download
2
Embed Size (px)
Citation preview
Integration of Fast Data Collection and Automated Probabilistic Assignment for
Protein NMR Spectroscopy
Arash Bahrami
Protein Structure determination by NMR • Sample Preparation
• Data collection
• Peak Picking
• Backbone resonance assignment
• Sidechain resonance assignment
• Secondary structure determination
• NOE data collection and assignment
• Structure calculation and refinement
• Individual software packages have been developed for each part but no integrated tool is available for the whole process.
• Integration needs interaction of individual components
• Probabilistic framework can provides robust interaction of components
Automation in NMR
On the average 1-4 months
80k$ per structure
1
2
3
Individual tools developed in CESG and NMRFAM
• PISTACHIO (Automated resonance assignment)
• PECAN (Secondary structure determination)
• MANI-LACS (Reference correction and outlier detection)
• HIFI-NMR (Fast and adaptive NMR data collection)
• HIFI-C (Adaptive determination of NMR couplings)
1 Hamid R. Eghbalnia, Arash Bahrami, Liya Wang, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(3):219-233.2 Hamid R. Eghbalnia, Liya Wang, Arash Bahrami, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(1):71-81.3 Liya Wang, Hamid R. Eghbalnia, Arash Bahrami, and John L. Markley (2005) J. Biomol. NMR, 32(1):13-22.4 Hamid R. Eghbalnia, Arash Bahrami, Marco Tonelli, Klaus Hallenga, and John L. Markley (2005) J. Am. Chem. Soc., 127(36) 12528 – 12536. 5 Gabriel Cornilescu, Arash Bahrami, Marco Tonelli, John L. Markley, Hamid R. Eghbalnia. (2007) J. Biomol. NMR, 38(4):341-351.
PISTACHIO
Native probabilistic PISTACHIO output Residue_Name P(H,N) H N CO CA CB P(H,N) H N P(H,N) H N P(H,N) H N P(no_assignment)
1 MET 0.000 0.000 0.00 0.00 55.29 34.51 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000
2 ASN 0.730 9.899 125.16 0.00 52.03 40.68 0.210 8.765 123.2 0.000 0.000 0.00 0.000 0.000 0.00 0.060
3 THR 1.000 9.121 116.72 0.00 59.37 63.99 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000
4 VAL 1.000 7.977 127.97 0.00 61.66 36.07 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000
5 CYS 1.000 8.310 126.57 0.00 59.14 31.70 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000
NMR-star format
1 1 MET CA C 55.291 1.000 0
2 1 MET CB C 34.509 1.000 0
3 2 ASN N N 125.160 1.000 0
4 2 ASN H H 9.899 1.000 0
5 2 ASN CA C 52.031 1.000 0
6 2 ASN CB C 40.684 1.000 0
7 3 THR N N 116.723 1.000 0
Overall view of the assignment probabilities
PISTACHIO is a probabilistic method for backbone and sidechain assignment.The input to PISTACHIO can be a any subset of following NMR experiments:
•HSQC•HNCO•CBCA(CO)NH
•HN(CA)CB•C(CO)NH•HBHA(CO)NH
•HN(CO)CA•HN(CA)CO•HN(CO)(CA)CB
•H(CCO)NH•HCCH-TOCSY
•HNCACB•HN(CO)CACB•HNCA
PECAN
Helix
Extended
PECAN optimizes a combination of information sources to yield energetic descriptions of secondary structure and constructs a probabilistic description wherein each residue is assigned a probability of belonging to a designated state (e.g. helix, sheet, etc.). PECAN is available at: http://www.bija.nmrfam.wisc.edu/PECAN
LACS
MANI-LACS3 (Linear Analysis of Chemical Shifts for reference correction and outlier detection) can detect potential outliers using linear analysis of chemical shifts. An outlier may be the result of miss assignment of chemical shifts. MANI-LACS reports probabilities for the presence of outliers. MANI-LACS is available at: http://www.bija.nmrfam.wisc.edu/MANI-LACS/
2D planes of 3D CBCA(CO)NNH experiment
collected on 800 MHz Varian Inova spectrometer
HIFI-NMR: High-Resolution Iterative Frequency Identification for NMR
Tilted-plane reduced dimensionality data collection that employs on-the-fly peak identification, spectral modeling, and selection of the next data plane to be collected.
Simplified Description of the HIFI NMR Approach
find a tilt angle that maximizes a dispersion function
f(p)
Has the last tilted plane added new information
???
YEScollect tilted plane
X°
NO peak list
dispersion function, f (p), measures the dispersion of the putative peaks on the
selected tilted plane
orthogonal planes
0° 90°
predicted chemical shift distribution
assign a probability of a peak being in a given voxel,
p
pro
bab
ility
co
lor
map
HIFI application to automated backbone assignments
HIFI - Data collection time
PINE – Assignment time
Assignment accuracy
WT Brazzein
53 a.a.12h 5m 98%
Ubiquitin
76 a.a.14h 5m 98%
Flavodoxin
176 a.a.48h 2h 85%
HIFI–C: A Fast and Robust Method for Determining NMR Couplings from Adaptive 3D to 2D Projections
Correlation and RMSD comparison of couplings collected by HIFI-C and 3D. Agreement between the two was within experimental error.
(A) GB3 protein (R = 99.8%, rmsd = 0.03 Hz). The total data collection times were 1.7 h for HIFI-C and 7.9 h for 3D.(B) PRP24-12 protein (R = 94.0%, rmsd = 0.25 Hz). The total data collection times were 14.6 h for HIFI-C and 44.1 h for 3D .
HIFI-NMR
PISTACHIO
PECANMANI-LACS
HIFI-C
Back to Automation Steps in NMR Proteomics
Redesign the Individual Tools to Provide Robust Probabilistic Interaction: PINE
MANI-LACS
PISTACHIO
PECAN
PINE
General Overview of Probabilistic Network Defined by PINE
Amino Acid Typing Network
Spin System Generation Network
Table 1. PINE performance result and comparison with PISTACHIO for the proteins that BMRB assignment are available.
Proteindesignator
Number of
ResiduePINE PISTACHIO Experiments represented in
the input peak lists‡
CPU time (h)
Assignmentaccuracy*
Secondary structure accuracy
CPU time (h)
Assignment accuracy* 1 2 3 4 5 6 7 8
At2g24940 109 0.2 98% 95% 1 95% * *
At1g77540 103 0.2 96% 94% 0.2 95% *
At2g23090 86 0.2 100% 92% 0.1 98%
AAH26994 101 0.2 95% 97% 0.2 90% * * *
At5g22580 111 1 95% 90% 5 88% * * *
At3g17210 112 1 94% 90% 6 90% * * * * * *
At3g51030 124 1 94% 88% 5 87% * * * * * *
At5g01610 170 1 80% 83% 6 70% * * *
At3g16450† 299 1.5 82% NA 7 73% * * * * * * *
BMRB 5106 70 0.2 95% 90% 1 95% * *
* Correct assignments is final structure and assignment deposited on PDB and BMRB † Stereo array isotope labeled (SAIL) protein; isotope shifts due to labeling were not accounted for.‡ Each data set included an HSQC or HNCO experiment; other experiments are indicated by numbers: 1 CBCA(CO)NH or HN(CO)CACB 2 HNCACB 3 HNCA 4 HN(CO)CA or CA(CO)NH 5 HN(CA)CO 6 H(CCO)NH or N15 TOCSY 7 C(CO)NH 8 HBHA(CO)NH
PINE Web Server
15%
65%
20%
0%
10%
20%
30%
40%
50%
60%
70%
UW-Madison US (OutsideUW)
Outside US
Job
s
PINE Server Statistics
Total Number of jobs submitted since July 2006: 1175 jobs
4235
40
72 68
37
60 63
5259 62 66
4940 36
77
6470 73 75
0102030405060708090
Aug
-06
Sep
-06
Oct
-06
No
v-06
De
c-06
Jan
-07
Feb
-07
Mar
-07
Apr
-07
May
-07
Jun
-07
Jul-
07A
ug-0
7S
ep-0
7O
ct-0
7N
ov-
07D
ec-
07Ja
n-0
8F
eb-0
8M
ar-0
8
Job
s
Iterative HCCH-TOCSY assignment
HBHA(CO)NHC(CO)NH
H (CCO)NH
HCCH-TOCSY
PINE, HIFI and Time Saving in NMR Proteomics
Time Saving Accuracy Main cause of possible inaccuracy
What may need to be done manually
HIFI 12 hours – 2 days data collection
VS
1 week – 2 weeks traditional methods
95%-100% peaks recovered with high probability depends on the size and the complexity of protein.
Some of the peaks may have very low intensities (in the noise level). They will have lower probabilities in the final peak list.
Manual analysis maybe needed to derive the remaining peaks from the lower probability list.
PINE Full Assignment in anytime between
5 min – 2 hours
VS
1 week – 1 month manual assignment
85%-100% correct assignment depends on the size and the complexity of protein.
Some of the real peak are missing in the peak lists.
Manual assignment of the remaining peaks can be easily done by scanning the spectra.
HIFI-NMR
Fast data collectionand peak identification
Referencing andoutlier checkAutomated assignment
Secondary structuredetermination
PISTACHIO MANI-LACS PECAN
PINE
On going project: Integration of HIFI and PINE
(A) HNCA (HC plane) 512 zero filling; 0.15 delay in sine window function
(B) HNCA (HC plane) 1024 zero filling; 0.45 delay in sine window function
(C) Difference between spectra (A) and (B)
X Y Probability
40 227 0.9846
56 231 0.9844
72 595 0.9846
89 245 0.7622
102 403 0.6541
110 380 0.9851
119 84 0.2486
128 359 0.9871
130 511 0.4452
… … …
(D) Probabilistic peak lists are generated for every plane based on different parameter settings and peaks volume.
Probabilistic Analysis of Spectra in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
On Fly Spin System Generation in HIFI
Find the optimum experiment and tilted angle
The optimum is the plane that maximizes the information regarding the ambiguous or missing position in spin systems considering
latest state of chemical shift assignment.
YEScollect the optimal tilted
or orthogonal plane
X°
Report the final peak lists, chemical shift
assignments, and secondary structure
Collect N15-HSQC
Predicted chemical shift distribution
Spectra Analysis Generate probabilistic
peak list
Derive the initial probabilistic spin systems
Spectra Analysis: Generate probabilistic peak list
Update the probabilistic spin system
Is the spin system network
quality good enough for the assignment
process?
PINE Derive the latest assignment
and secondary structure
Are the assignment and
secondary structure complete?
Collect the most sensitive orthogonal plane
0°
YESNONO
HIFI-NMR
Fast data collectionand peak identification
Referencing andoutlier checkAutomated assignment
Secondary structuredetermination
PISTACHIO MANI-LACS PECAN
NOESY Assignment
PINE
Acknowledgements
• John Markley
• Hamid Eghbalnia
• Marco Tonelli
All CESG member providing data:
• Claudia Cornilescu• Shanteri Singh• Jikui Song• Brian Volkman• Francis Peterson
• Ziqi Dai
• Gabriel Cornislescu
• Klaus Hallenga
• Milo Westler
• Liya Wang
• Eldon Ulrich