Upload
theresa-george
View
215
Download
2
Embed Size (px)
Citation preview
Structure Representation and Coordinates Format
Lecture 3Structural Bioinformatics
Dr. Avraham Samson81-871
2
The PDB Format
• A full description is here • It was designed around an 80 column punched card!• It was designed to be human readable• It is used by almost every piece of software that
deals with structural data
3
The PDB Format - Records
• Every PDB file may be broken into a number of lines terminated by an end-of-line indicator. Each line in the PDB entry file consists of 80 columns. The last character in each PDB entry should be an end-of-line indicator.
• Each line in the PDB file is self-identifying. The first six columns of every line contain a record name, left-justified and blank-filled. This must be an exact match to one of the stated record names.
• The PDB file may also be viewed as a collection of record types. Each record type consists of one or more lines.
• Each record type is further divided into fields.
4
The PDB Format – An Example – The Header
5
The PDB Format – An Example – The Atomic Coordinates
6
The Description – Atom Records
7
What is Wrong with this Approach?
• The description and the data are separate• Parsing is a nightmare – the most complex piece
of code we have in our research laboratory probably remains the PDB parser
• There are no relationships between items of data• Some data just cannot be parsed• The fixed column format cannot represent some of
today’s structures …
Structures are Spread Over Multiple Files – Most Users are Not Aware of this
8
9
REMARK 3 REFINEMENT. BY THE RESTRAINED LEAST-SQUARES PROCEDURE OFREMARK 3 J. KONNERT AND W. HENDRICKSON (PROGRAM *PROLSQ*). THE RREMARK 3 VALUE IS 0.168 FOR 2680 REFLECTIONS WITH I GREATER THANREMARK 3 2.0*SIGMA(I) REPRESENTING 74 PER CENT OF THE TOTALREMARK 3 AVAILABLE DATA IN THE RESOLUTION RANGE 10.0 TO 2.0REMARK 3 ANGSTROMS.
REMARK 4 THE ERABUTOXIN A (EA) CRYSTAL STRUCTURE IS ISOMORPHOUS WITHREMARK 4 THE KNOWN STRUCTURE OF ERABUTOXIN B (PROTEIN DATA BANKREMARK 4 ENTRIES *2EBX*, *3EBX*). EA DIFFERS FROM EB BY A SINGLEREMARK 4 SUBSTITUTION - EA ASN 26 FOR EB HIS 26. THE EA STARTINGREMARK 4 MODEL WAS OBTAINED FROM A MOLECULAR REPLACEMENT STUDY INREMARK 4 WHICH COORDINATES FOR 309 OF THE 475 ATOMS IN THE EBREMARK 4 STRUCTURE (*2EBX*) WERE USED.
PDB Format - Important Components of the Data are Lost to
All But Humans
mmCIF Was Developed to Address these Problems
Methods in Enzymology. 1997 277, 571-590
10
11
• All PDB data should be captured• Describe a paper’s material and methods
section• Describe biologically active molecule• Fully describe secondary structure but not
tertiary or quaternary• Describe details of chemistry (inc. 2D)• Meaningful 3D views
mmCIF – Scope of the Initial Effort
12
loop_ _atom_site.group_PDB _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_seq_id _atom_site.label_alt_id _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.footnote_id _atom_site.entity_id _atom_site.entity_seq_num _atom_site.id ATOM N N VAL A 11 . 25.360 30.691 11.795 1.00 17.93 . 1 11 1 ATOM C CA VAL A 11 . 25.970 31.965 12.332 1.00 17.75 . 1 11 2 ATOM C C VAL A 11 . 25.569 32.010 13.881 1.00 17.83 . 1 11 3
mmCIF - Extract from a Data File
13
Summary• mmCIF has provided the PDB with a robust data
representation which serves as conceptual and physical schema upon which the current RCSB, PDBe and PDBj are built
• This work predated XML and XML-schema but embodies the important concepts inherent in these descriptions
• mmCIF was later exactly converted into XML and is now used more than mmCIF, but much less than the old PDB format
• Today mmCIF has no advantage over PDB
Other representations
• SMILES http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
14
Other representations
15
Representing Positions
• Cartesian coordinates (x,y,z) are an easy and natural means of representing a position in 3D space
• There are many other alternatives such as polar notation (r,θ,φ) and you can invent others if you want to
Other representations
-Cartesian coordinates vs. polar coordinates
17
The center of the graph is called the pole.
Angles are measured from the positive x axis.
Points are represented by a radius and an angle
(r, )
radius angle
To plot the point
4,5
First find the angle
Then move out along the terminal side 5
Let's generalize this to find formulas for converting from rectangular to polar coordinates.
(x, y)
r y
x
222 ryx
x
ytan
22 yxr
x
y1tan
Let's generalize the conversion from polar to rectangular coordinates.
r
xcos
,rr
y
x
r
ysin
cosrx
sinry
• How would you calculate distance?
• How would you calculate centroid?
• How would you calculate dihedral angle?
21