27
Protein Data Bank: An open access resource enabling basic and applied research and education in biology and medicine John Westbrook, Ph.D. RCSB PDB Data & Software Architect Lead

Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Protein Data Bank: An open access resource enabling basic and applied research and education in biology and medicine

John Westbrook, Ph.D.

RCSB PDB Data & Software Architect Lead

Page 2: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Overview

A bit of background about the PDB

PDB data content

PDB data representation and data quality standards

The PDB biocuration platform

Some key features delivered by the RCSB PDB

1

Page 3: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Protein Data Bank

First open access digital resource in biology (est. 1971 with 7 entries)

Single global archive of 3-D macromolecular structures(contains >120,000 entries)

Freely available to all at pdb.org

US PDB headquartered at Rutgers/UCSD (NSF, NIH, DOE)

US PDB part of Worldwide PDB with partners in EU and Japan

Page 4: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Worldwide Protein Data Bank

Established in 2003, wwPDBensures data are freely & globally available in a common repository

Collaborate on data quality and representation standards, and tools and procedures for biocuration

Each partner delivers different services and views of the common repository of data

3

wwPDB Advisory Committee Meeting, 2015

Page 5: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

4

1970s 1980s 1990s 2000s 2010s

Science

Small enzymes, RNA DNA, viruses

Protein-DNA complexes

Ribosomes Largemacromolecular machines

Community

PDB archive established

IUCr deposition guidelines

Standardization, RCSB

Experimental data required, wwPDB

Validation standards

Technology

X-ray diffraction, diffractometers,punched cards

Synchrotron radiation, computer graphics, NMR

Electron microscopy, fast computers, fast detectors

High throughput structural genomics, robots

Hybrid methods

Page 6: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Diverse Molecular Content of the PDB

5

Page 7: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

19

80

19

85

19

90

19

95

20

00

20

05

20

10

20

15

En

trie

s in

Arc

hiv

e

Year

X-rayCrystallography

Supporting Rapidly Evolving Experimental Technologies

6

0

2000

4000

6000

8000

10000

12000

197

5

198

0

198

5

199

0

199

5

200

0

200

5

201

0

201

5

Entr

ies in

Arc

hiv

e

Year

NuclearMagne cResonanceSpectroscopy

0

200

400

600

800

1000

197

5

198

0

198

5

199

0

199

5

200

0

200

5

201

0

201

5

En

trie

s in A

rch

ive

Year

3DElectronMicroscopy

0

2

4

6

8

10

12

14

197

5

198

0

198

5

199

0

199

5

200

0

200

5

201

0

201

5

Integrative & Hybrid Methods First 500

2012

First 5001991

First 5001995

Year

Page 8: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

PDB Depositors >800 new entries/month

RCSB PDB347 million

PDBe100 million

PDBj58 million

PDB UsersFTP and RSYNC Download Traffic in 2015:526 million downloads

As of 2015, ~55% increase in the number of global depositions since 2008

Year

# o

f E

ntr

ies

Growth in PDB Depositions

7

0

2000

4000

6000

8000

10000

12000

14000

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

20

13

20

14

20

15

20

16

20

17

20

18

Total Number of Annual Depositions

Projected Annual Depositions

PDB Growth and Data Usage

Page 9: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

PDB Data Content Atomic coordinates and primary

experimental data

Sample composition and preparation details

Protein and nucleic acid polymer sequences and taxonomy

Small molecules (ligands)

Experimental data collection, structure solution, and structure refinement

Structure classification by sequence, function and other criteria

Citation and references to related data resources

PDB entries contain 250 – 1200+ unique items of data

The 3.8 angstrom resolution cryo-EM structure of Zika virus.

Sirohi, D., Chen, Z., Sun, L., Klose, T., Pierson, T.C., Rossmann,

M.G., Kuhn, R.J. (2016) Science 352: 467-470

8

Page 10: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

PDB manages data using the macromolecular extension to the Crystallographic Information Framework (mmCIF) originally developed as an IUCr data standard

PDB coordinates the extension of the standard (PDBx/mmCIF) to support the broader needs of both contributors and users of the archive (> 4400 data terms)

A PDBx/mmCIF Working Group of community experts and methods developers oversees the evolution of the standard and ensures that the standard is well supported by key community software tools.

PDB hosts community workshops to support the data standard and maintains a web site serving PDBx/mmCIF data dictionaries, schema and software tools (mmcif.wwpdb.org)

Community Data Standards

Workshop

Participants,

September 2011

Workshop

Participants,

October

2014

wwPDB

Deposition

PDBx Format

in the Lab

Structure

Determination

Round Trip

wwPDB Processing

and Annotation

PDBx Format

In PDB Archive

9

Page 11: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

• 1991 • 1994 • 1997 • 2000 • 2003 • 2006

DDL 1 DDL 2

IUCr mmCIF Working Party IUCr mmCIF Maintenance Group

mmCIF/Core

sync’d

mmCIF V1 Core CIF V1 mmCIF V2

Honolulu

Workshops

Tarrytown

York

Brussels

RutgersCARB

Orlando

Seattle

St. Louis Glasgow

Rutgers EBI

mmCIFData

• 2009

mmCIF +Extensions PDB Exchange Dictionary

wwPDB

One Archive – One Dictionary

PDBx/mmCIF Development Timeline• 2012

wwPDB Common Deposition & Annotation

System

10

Page 12: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Community Standards for Data Quality

Task ForceMeeting/

WorkshopChair(s)/Membership Outcomes

X-ray

Validation

Task Force

2008

2015

Randy Read (Univ of Cambridge)

17 members

(2011) Structure

19: 1395-1412

NMR

Validation

Task Force

2009, 2011, 2013 (x2),20152016

Gaetano Montelione (Rutgers)

Michael Nilges (Institut Pasteur)

10 members

(2013) Structure

21: 1563-1570

3DEM Validation Task Force

2010 Richard Henderson (MRC-LMB)Andrej Sali (UCSF)21 members

(2012) Structure

20: 205-214

Small-Angle Scattering Task Force

20112014

Jill Trewhella (Univ Sydney)6 members

(2013) Structure

21: 875-881

Hybrid Methods Task Force

2014 Andrej Sali (UCSF), TorstenSchwede (Univ Basel), Jill Trewhella(Univ Sydney) 27 members

(2015) Structure

23: 1156-1167

Method-specific Community Validation Task Forces have been convened to

collect recommendations and develop consensus on data quality standards,

identify software tools to perform required validation tasks, and to define related

content requirements for archiving.

Page 13: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Presenting Data Quality to Diverse Audiences Provide relative and absolute

quality metrics in graphical format

Provide tabulations of key data, refinement statistics, and quality diagnostics

Assess all macromolecular and ligand structural components

PDF format reports can be uploaded with manuscript submission to a journal

Diagnostics also delivered as an XML format data files

12

Residue Plots

Grey – not modeled

Green, yellow, orange, red – 0,1,2, 3 or more issues

Red dot – poor fit to electron density

Overall Quality

Page 14: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

ill Pre-deposition

Validationvalidate.wwpdb.org

Depositiondeposit.wwpdb.org

Data Harvesting

wwpdb.org/deposition

• Generate atomic coordinate and experimental data files

• Assemble mandatory data items for deposition

• Access data via web and ftp download and web services

• Enable other research for user community

OK 1 issue 2 issues 3+ issues

OneDep: a unified global deposition, biocuration, and validation system

• Polymer check • Ligand check• Electron density fit

1.

2.

3.

Data UsersData providers

Biocuration

PDB

4.

PDB

Public Releaseftp://ftp.wwpdb.org

rcsb.org

5.

OneDep – The PDB Biocuration Platform

13

Page 15: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

RCSB PDB Data Delivery Pipeline

14

Deposition &Biocuration

Page 16: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Launchpad for a wide range of functionalities

o Deposit

o Search

o Analyze

o Visualize

o Tabulate

o Download

RCSB PDB Web Portal

15

http://rcsb.org

15

Page 17: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

RCSB PDB Mobile App

Provides convenient access to PDBdata on the go with a minimal feature set

Provides a browser, simple search, and an interactive 3D viewer

Supports iPhone, iPad, and Android

16

http://www.rcsb.org/pdb/static.do?p=mobile/RCSBapp.html

Page 18: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Web Services (RESTful APIs)

Programmatic access to data: application-to-application communication

Provides external workflows and analysis toolswith direct access to a wide range of PDB data and services

Enables integration of PDB data and services with programs and scripts in a variety of computer languages and computing environments

17

http://www.rcsb.org/pdb/software/rest.do

Page 19: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Enabling Data Access Through Integration and Visualization

Protein Feature View: mapping protein sequence to 3D structure

Gene View: mapping genome location to 3D structure

Visualization Browser native visualization tools using an efficient data

compression protocol Small molecule electron density and binding interactions

Data integration resource files provided for download: Correspondences with CCDC ligand structures Sequence cluster data files Phased release of data to support blinded molecular docking

tests

18

Page 20: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Protein Sequence Integrated View

19

Page 21: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Genome Sequence Integrated View

20

Page 22: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Molecular Visualization

21

Zn

http://mmtf.rcsb.org/https://github.com/arose/ngl

Page 23: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Visualizing Small Molecule Interactions

22

Page 24: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Reaching Diverse User Communities

Who are our users? What are they using?

Biologists: structural biology, biophysics, biochemistry, genetics, Immunology, pharmacology, cell and molecular biology …

RCSB PDB website, deposition tools, data

Other scientists: bioinformatics, software developers, …

Web Services, search engines, data

Students & teachers PDB-101

Media: Writers, textbook authors, patient advocacy groups, …

Images, data, information, outreach material, e.g., posters

General public: Curious/interested individuals, artists, sculptors, …

Images, Molecule of the Month, information from external media

23

23

Page 25: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Posters

Online Educational Resources

24

Animations Paper Models

Resources to help understand biology at the molecular level

24

http://pdb101.rcsb.org/

Page 26: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

Molecule of the Month

25

Page 27: Protein Data Bank...Protein Data Bank First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >120,000

The Protein Data Bank

Archive is managed by:

Funding:

NSF, NIH, DOE

Funding:

EMBL-EBI, Wellcome Trust,

BBSRC, NIGMS, EU

Funding:

NBDC-JST

Funding:

NLM

Worldwide Protein Data Bankwwpdb.org

Members

rcsb.org pdbe.org pdbj.org bmrb.wisc.edu

RCSB Protein Data Bank proteindatabank ja-jp.facebook.com/PDBjapan

@buildmodels @PDBEurope @PDB_ja

PDB members past and present at the PDB40 Anniversary Symposium, 2011

PDB Management

26

26