15
PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Embed Size (px)

Citation preview

Page 1: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

PSI Data Management and Reporting:Expectations, Standards and Utility

J. Michael Sauder

Director, Bioinformatics

NYSGXRC Project Leader

Page 2: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

NIGMS Expectations

• http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-05-001.html

• “… a database for deposition of information on experimental outcome data (both successful and unsuccessful).

• “These data include … cDNA cloning, expression vector construction, protein production and purification, protein biochemical characterizations, crystallization screening, synchrotron and NMR data collection, etc.

• “The PSI Research Network centers will be required to provide plans for the collection, maintenance, and transfer of experimental results into this central data repository.

• PepcDB… will contain information on these important results and provide a platform for cross-center data mining to capitalize on the PSI investment

Page 3: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Protocols vs Results

• General protocols are reported by each PSI Center in PepcDB

• General protocols have been published in the literature by several Centers

• However, one of the real values of PepcDB lies in the detailed experimental trial results for each target– Which clones were made? (PSI-MR)– Which constructs yield soluble protein? (which don’t?)– What are the fermentation conditions? Purification?– What was the protein yield? The final concentration? The

experimental molecular weight?– What conditions gave crystals? How many crystal forms?

What was the cryoprotectant? Which conditions led to diffraction data? To the structure?

Page 4: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

TargetDB/PepcDB Data Mining

• TargetDB status is informative, but far more useful would be data about – Small scale expression/solubility testing– Large scale purification yield, concentration, oligomeric state– Conditions that yielded diffracting crystals

• Publications– Overton et al (2008) Bioinformatics 24:901-907. “ParCrys: a Parzen window

density estimation approach to protein crystallization propensity prediction” (PDB, TargetDB, PepcDB)

– Martin-Galiano et al (2008) Proteins 70:1243-1256 “Predicting experimental properties of integral membrane proteins by a naive Bayes approach” (TargetDB)

– Bannen et al (2007) J Struct Funct Genomics 8:217-226 “Effect of low-complexity regions on protein structure determination” (TargetDB/PepcDB)

– Smialowski et al (2007) Bioinformatics 23:2536-2542 “Protein solubility: sequence based prediction and experimental verification” (TargetDB)

– Slabinski et al (2007) Bioinformatics 23:3403-3405 “XtalPred: a web server for prediction of protein crystallizability” (TargetDB)

– Nair & Rost (2004) Nucl Acids Res 32:W517-W521 “LOCnet and LOCtarget: sub-cellular localization for structural genomics targets” (TargetDB)

Page 5: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Process vs Reporting

0 110

Selected Mol biol in progress

140

Fail PCR

Cloning failed

170 220 230

315

Failed expresn

Failed solubility

Fermentation on hold

10

Active

365

Purification on hold

685 665 655 640 620

482450

Purified; completed to collaborator

Purification research

unsuccessful

Cryst in screening

Crystallization admitted

210

Failed transform

270

Clone completed

to ferm

310

Fermentation voided

320

Fermentation waiting

370

Purification waiting

390

Purification in progress

430

Purification technical

error

440

Purification failed

460

Purification research marginal

470

Purification research

successful

645

Cryst in optimization

650

Screening grainy ppt

Optimization grainy ppt

Optimization microcrystals

Optimization crystals

710 720 730

Crystal abandoned

Crystal examined

Crystal waiting

collection

810 947

Dataset collected

950

Structure deposited

Structure

Cloned Expressed Soluble

Soluble

Soluble Purified

Purified

Crystallized Diffr data In PDB

Selected

Page 6: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Need to Consider the Future… Now

• How much data are we capturing in our databases compared to how much we are reporting?

• What will happen to Center data after PSI-2?

• We should ensure that as much as possible of our Center data is publicly accessible in PepcDB

Page 7: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Trial Data Reporting by Center

Center Experimental trial details reported to PepcDB

JCSG Protein sequence, cloning vector, fermentation media, purification method, crystallization conditions

MCSG Protein sequence, cloning vector, expression host, temperature, media

NESGC Protein sequence

NYSGXRC DNA and protein sequence, construct boundaries, cloning vector, small scale expression/solubility scores, media, MW, large scale media, volume, induction time/temp, pellet weight, harvest date, SeMet Y/N, purification yield, concentration, purity, MW, oligomeric state, start/end dates, mass spec pass/fail, analysis comments, MW, crystallization conditions, protein concentration, temperature, cryo, harvest/collection dates, anomalous scatterer, diffraction resolution

Page 8: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

PepcDB Trial Schema

Page 9: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

NYSGXRC <protocolDetails>

<protocolId>SGX_MOLBIO_PCR### Molecular Biology - PCR ####PCR start date: 03/20/2007PCR last updated: 04/16/2007Notebook #: 1358 Page: 13

<protocolId>SGX_MOLBIO_TOPO_TRANSFORM### Molecular Biology - cloning ####SGX clonename: 10001b2BSt5p1Vector: pSGX4 (BS)

<protocolId>SGX_MOLBIO_EXPR_SOL### Small scale expression/solubility ###Expression score: HIGHSolubility rating: HIGHPredicted molecular weight (kDa): 44.95Growth Media (small scale): ZYP-5052Observed molecular Weight (kDa): 46Sonication buffer: PLB1</protocolDetails>

<protocolId>SGX_FERM_ECOLI_ZYP### Fermentation ###SGX PID: 11732Growth Media (large scale): ZYP-5052Total volume (L): 1Induction time (hr): 21Induction temp. (C): 22Pellet weight (g): 19Harvest date: 05/17/2006Selenomet: N

<protocolId>SGX_PURIF_ECOLI_BACT### Purification ###SGX PID: 11732SGX pool: 1Selenomet: NStart date: 06/21/2006Yield (mg): 52.3Final concentration (mg/ml): 52.3Observed molecular weight (kDa): 33Notebook #: 1136 Page: 115End date: 06/23/2006Purity (%): 98Oligomeric state: monomer (1 subunit)

DNA source?

Primers?

Host cells?

Antibiotic resistance?

Purification steps?

Buffers?

Page 10: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

NYSGXRC <protocolDetails>

<protocolId>SGX_MALDI</protocolId>### Mass Spec - MALDI ###Mass Spec Status: Passed

<protocolId>SGX_ESI-MS### Mass Spec - ESI-MS ###Mass Spec Status: PassedObserved MW: 32528

<protocolId>SGX_XTAL### Crystallization ###SGX XID: 27611Tray barcode: N0081969Temperature: 21Protein concentration (mg/ml): 26Well location: G 12Well conditions: [100mM] 1M Hepes pH 7.5 + [25%] 50% PEG 3350 +[200mM] 1M Magnesium Chloride hexahydrate Cryoprotectant comment: [20%] 80% Glycerol Harvest date: 09/05/2006Collection date: 09/05/2006APS resolution: 2.3Crystal status: D-DATASET COLLECTED

Crystal morphology?

Space group?

Page 11: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Proposed Data Reporting

• Molecular biology– DNA source, primers, vector, PSI-MR clone ID, Host,

antibiotic resistance– Expression and solubility rating (small scale), media,

predicted and observed molecular weight

• Fermentation– Media, volume, induction time, temp, selenoMet?

• Purification– Purification steps, final buffer, yield, concentration,

molecular weight, purity, oligomeric state– Accurate MW if mass spec done

• Crystallization– Temperature, protein concentration, well conditions,

cryoprotectant and resolution, if applicable

Page 12: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

<MeasurementName> <…Value>

• Alternative mechanism to report experimental data– <MeasurementName>molecular weight</MeasurementName>– <MeasurementValue>32475</measurementValue>– <MeasurementUnit>Da</MeasurementUnit>

• Examples– Molecular weight– Isoelectric point– Phosphorylation– Methylation– Element analysis / stoichiometry– etc.

Page 13: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Optional tags

• http://mmcif.pdb.org/sg-data/protprod.html

• PDB-proposed mmCIF-like tags to describe cloning, expression, purification, crystallization, etc.

• Examples– _entity_src_gen_pure.protein_concentration– _entity_src_gen_pure.protein_yield– _entity_src_gen_pure.protein_oligomeric_state– _pdbx_buffer_components.name– _pdbx_buffer_components.conc– _exptl_crystal_grow.temp

Page 14: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Recommendation

• NYSGXRC plans to further improve our reporting of trial results in 2008

• We encourage all PSI Centers to utilize the PepcDB <protocolDetails> or <trialMeasurement> tags to report as much experimental trial results as possible in their PepcDB XML updates

• See associated poster

Page 15: PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

Acknowledgements

• SGX LIMS development team– Ryan Allis– Chris Hansen– Peter Hillier– Ken Schwinn

• AECOM - Veena Venkatagiriyappa (Fiser lab)

• Andrei Kouranov (PDB)

• LIMS improvements suggested by SGX protein production, crystallization, and beamline staff

• This work was supported by SGX Pharmaceuticals, Inc., and NIH Grant U54 GM074945