31
Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Embed Size (px)

Citation preview

Page 1: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Integration of Fast Data Collection and Automated Probabilistic Assignment for

Protein NMR Spectroscopy

Arash Bahrami

Page 2: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Protein Structure determination by NMR • Sample Preparation

• Data collection

• Peak Picking

• Backbone resonance assignment

• Sidechain resonance assignment

• Secondary structure determination

• NOE data collection and assignment

• Structure calculation and refinement

• Individual software packages have been developed for each part but no integrated tool is available for the whole process.

• Integration needs interaction of individual components

• Probabilistic framework can provides robust interaction of components

Automation in NMR

On the average 1-4 months

80k$ per structure

1

2

3

Page 3: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Individual tools developed in CESG and NMRFAM

• PISTACHIO (Automated resonance assignment)

• PECAN (Secondary structure determination)

• MANI-LACS (Reference correction and outlier detection)

• HIFI-NMR (Fast and adaptive NMR data collection)

• HIFI-C (Adaptive determination of NMR couplings)

1 Hamid R. Eghbalnia, Arash Bahrami, Liya Wang, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(3):219-233.2 Hamid R. Eghbalnia, Liya Wang, Arash Bahrami, Amir Assadi, and John L. Markley (2005) J. Biomol. NMR, 32(1):71-81.3 Liya Wang, Hamid R. Eghbalnia, Arash Bahrami, and John L. Markley (2005) J. Biomol. NMR, 32(1):13-22.4 Hamid R. Eghbalnia, Arash Bahrami, Marco Tonelli, Klaus Hallenga, and John L. Markley (2005) J. Am. Chem. Soc., 127(36) 12528 – 12536. 5 Gabriel Cornilescu, Arash Bahrami, Marco Tonelli, John L. Markley, Hamid R. Eghbalnia. (2007) J. Biomol. NMR, 38(4):341-351.

Page 4: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

PISTACHIO

Native probabilistic PISTACHIO output Residue_Name P(H,N) H N CO CA CB P(H,N) H N P(H,N) H N P(H,N) H N P(no_assignment)

1 MET 0.000 0.000 0.00 0.00 55.29 34.51 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000

2 ASN 0.730 9.899 125.16 0.00 52.03 40.68 0.210 8.765 123.2 0.000 0.000 0.00 0.000 0.000 0.00 0.060

3 THR 1.000 9.121 116.72 0.00 59.37 63.99 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000

4 VAL 1.000 7.977 127.97 0.00 61.66 36.07 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000

5 CYS 1.000 8.310 126.57 0.00 59.14 31.70 0.000 0.000 0.00 0.000 0.000 0.00 0.000 0.000 0.00 0.000

NMR-star format

1 1 MET CA C 55.291 1.000 0

2 1 MET CB C 34.509 1.000 0

3 2 ASN N N 125.160 1.000 0

4 2 ASN H H 9.899 1.000 0

5 2 ASN CA C 52.031 1.000 0

6 2 ASN CB C 40.684 1.000 0

7 3 THR N N 116.723 1.000 0

Overall view of the assignment probabilities

PISTACHIO is a probabilistic method for backbone and sidechain assignment.The input to PISTACHIO can be a any subset of following NMR experiments:

•HSQC•HNCO•CBCA(CO)NH

•HN(CA)CB•C(CO)NH•HBHA(CO)NH

•HN(CO)CA•HN(CA)CO•HN(CO)(CA)CB

•H(CCO)NH•HCCH-TOCSY

•HNCACB•HN(CO)CACB•HNCA

Page 5: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

PECAN

Helix

Extended

PECAN optimizes a combination of information sources to yield energetic descriptions of secondary structure and constructs a probabilistic description wherein each residue is assigned a probability of belonging to a designated state (e.g. helix, sheet, etc.). PECAN is available at: http://www.bija.nmrfam.wisc.edu/PECAN

Page 6: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

LACS

MANI-LACS3 (Linear Analysis of Chemical Shifts for reference correction and outlier detection) can detect potential outliers using linear analysis of chemical shifts. An outlier may be the result of miss assignment of chemical shifts. MANI-LACS reports probabilities for the presence of outliers. MANI-LACS is available at: http://www.bija.nmrfam.wisc.edu/MANI-LACS/

Page 7: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

2D planes of 3D CBCA(CO)NNH experiment

collected on 800 MHz Varian Inova spectrometer

HIFI-NMR: High-Resolution Iterative Frequency Identification for NMR

Tilted-plane reduced dimensionality data collection that employs on-the-fly peak identification, spectral modeling, and selection of the next data plane to be collected.

Page 8: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Simplified Description of the HIFI NMR Approach

find a tilt angle that maximizes a dispersion function

f(p)

Has the last tilted plane added new information

???

YEScollect tilted plane

NO peak list

dispersion function, f (p), measures the dispersion of the putative peaks on the

selected tilted plane

orthogonal planes

0° 90°

predicted chemical shift distribution

assign a probability of a peak being in a given voxel,

p

pro

bab

ility

co

lor

map

Page 9: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

HIFI application to automated backbone assignments

HIFI - Data collection time

PINE – Assignment time

Assignment accuracy

WT Brazzein

53 a.a.12h 5m 98%

Ubiquitin

76 a.a.14h 5m 98%

Flavodoxin

176 a.a.48h 2h 85%

Page 10: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

HIFI–C: A Fast and Robust Method for Determining NMR Couplings from Adaptive 3D to 2D Projections

Correlation and RMSD comparison of couplings collected by HIFI-C and 3D. Agreement between the two was within experimental error.

(A) GB3 protein (R = 99.8%, rmsd = 0.03 Hz). The total data collection times were 1.7 h for HIFI-C and 7.9 h for 3D.(B) PRP24-12 protein (R = 94.0%, rmsd = 0.25 Hz). The total data collection times were 14.6 h for HIFI-C and 44.1 h for 3D .

Page 11: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

HIFI-NMR

PISTACHIO

PECANMANI-LACS

HIFI-C

Back to Automation Steps in NMR Proteomics

Page 12: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Redesign the Individual Tools to Provide Robust Probabilistic Interaction: PINE

MANI-LACS

PISTACHIO

PECAN

PINE

Page 13: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

General Overview of Probabilistic Network Defined by PINE

Page 14: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Amino Acid Typing Network

Spin System Generation Network

Page 15: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Table 1. PINE performance result and comparison with PISTACHIO for the proteins that BMRB assignment are available.

Proteindesignator

Number of

ResiduePINE PISTACHIO Experiments represented in

the input peak lists‡

CPU time (h)

Assignmentaccuracy*

Secondary structure accuracy

CPU time (h)

Assignment accuracy* 1 2 3 4 5 6 7 8

At2g24940 109 0.2 98% 95% 1 95% * *

At1g77540 103 0.2 96% 94% 0.2 95% *

At2g23090 86 0.2 100% 92% 0.1 98%

AAH26994 101 0.2 95% 97% 0.2 90% * * *

At5g22580 111 1 95% 90% 5 88% * * *

At3g17210 112 1 94% 90% 6 90% * * * * * *

At3g51030 124 1 94% 88% 5 87% * * * * * *

At5g01610 170 1 80% 83% 6 70% * * *

At3g16450† 299 1.5 82% NA 7 73% * * * * * * *

BMRB 5106 70 0.2 95% 90% 1 95% * *

* Correct assignments is final structure and assignment deposited on PDB and BMRB † Stereo array isotope labeled (SAIL) protein; isotope shifts due to labeling were not accounted for.‡ Each data set included an HSQC or HNCO experiment; other experiments are indicated by numbers: 1 CBCA(CO)NH or HN(CO)CACB 2 HNCACB 3 HNCA 4 HN(CO)CA or CA(CO)NH 5 HN(CA)CO 6 H(CCO)NH or N15 TOCSY 7 C(CO)NH 8 HBHA(CO)NH

Page 16: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

PINE Web Server

Page 17: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

15%

65%

20%

0%

10%

20%

30%

40%

50%

60%

70%

UW-Madison US (OutsideUW)

Outside US

Job

s

PINE Server Statistics

Total Number of jobs submitted since July 2006: 1175 jobs

4235

40

72 68

37

60 63

5259 62 66

4940 36

77

6470 73 75

0102030405060708090

Aug

-06

Sep

-06

Oct

-06

No

v-06

De

c-06

Jan

-07

Feb

-07

Mar

-07

Apr

-07

May

-07

Jun

-07

Jul-

07A

ug-0

7S

ep-0

7O

ct-0

7N

ov-

07D

ec-

07Ja

n-0

8F

eb-0

8M

ar-0

8

Job

s

Page 18: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Iterative HCCH-TOCSY assignment

HBHA(CO)NHC(CO)NH

H (CCO)NH

HCCH-TOCSY

Page 19: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

PINE, HIFI and Time Saving in NMR Proteomics

Time Saving Accuracy Main cause of possible inaccuracy

What may need to be done manually

HIFI 12 hours – 2 days data collection

VS

1 week – 2 weeks traditional methods

95%-100% peaks recovered with high probability depends on the size and the complexity of protein.

Some of the peaks may have very low intensities (in the noise level). They will have lower probabilities in the final peak list.

Manual analysis maybe needed to derive the remaining peaks from the lower probability list.

PINE Full Assignment in anytime between

5 min – 2 hours

VS

1 week – 1 month manual assignment

85%-100% correct assignment depends on the size and the complexity of protein.

Some of the real peak are missing in the peak lists.

Manual assignment of the remaining peaks can be easily done by scanning the spectra.

Page 20: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

HIFI-NMR

Fast data collectionand peak identification

Referencing andoutlier checkAutomated assignment

Secondary structuredetermination

PISTACHIO MANI-LACS PECAN

PINE

On going project: Integration of HIFI and PINE

Page 21: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

(A) HNCA (HC plane) 512 zero filling; 0.15 delay in sine window function

(B) HNCA (HC plane) 1024 zero filling; 0.45 delay in sine window function

(C) Difference between spectra (A) and (B)

X Y Probability

40 227 0.9846

56 231 0.9844

72 595 0.9846

89 245 0.7622

102 403 0.6541

110 380 0.9851

119 84 0.2486

128 359 0.9871

130 511 0.4452

… … …

(D) Probabilistic peak lists are generated for every plane based on different parameter settings and peaks volume.

Probabilistic Analysis of Spectra in HIFI

Page 22: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 23: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 24: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 25: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 26: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 27: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 28: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

On Fly Spin System Generation in HIFI

Page 29: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Find the optimum experiment and tilted angle

The optimum is the plane that maximizes the information regarding the ambiguous or missing position in spin systems considering

latest state of chemical shift assignment.

YEScollect the optimal tilted

or orthogonal plane

Report the final peak lists, chemical shift

assignments, and secondary structure

Collect N15-HSQC

Predicted chemical shift distribution

Spectra Analysis Generate probabilistic

peak list

Derive the initial probabilistic spin systems

Spectra Analysis: Generate probabilistic peak list

Update the probabilistic spin system

Is the spin system network

quality good enough for the assignment

process?

PINE Derive the latest assignment

and secondary structure

Are the assignment and

secondary structure complete?

Collect the most sensitive orthogonal plane

YESNONO

Page 30: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

HIFI-NMR

Fast data collectionand peak identification

Referencing andoutlier checkAutomated assignment

Secondary structuredetermination

PISTACHIO MANI-LACS PECAN

NOESY Assignment

PINE

Page 31: Integration of Fast Data Collection and Automated Probabilistic Assignment for Protein NMR Spectroscopy Arash Bahrami

Acknowledgements

• John Markley

• Hamid Eghbalnia

• Marco Tonelli

All CESG member providing data:

• Claudia Cornilescu• Shanteri Singh• Jikui Song• Brian Volkman• Francis Peterson

• Ziqi Dai

• Gabriel Cornislescu

• Klaus Hallenga

• Milo Westler

• Liya Wang

• Eldon Ulrich