Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures ...

Proteins

Structural Bioinformatics

Specific databases of protein sequencesand structures

Swissprot PIR TREMBL (translated from DNA) PDB (Three Dimensional Structures)

“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”

Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry.

Myoglobin – the first high resolution protein structure

Why Proteins Structure ?Why Proteins Structure ?

Proteins are fundamental components of all living

cells, performing a variety of biological tasks.

Each protein has a particular 3D structure that

determines its function.

Protein structure is more conserved than protein

sequence , and more closely related to function.

There Are Four Levels of Protein StructurePrimary: amino acid linear sequence.

Secondary: -helices, β-sheets and loops.

Tertiary: the 3D shape of the fully folded polypeptide chain

Quaternary: arrangement of several polypeptide chains.

Symbols for the 20 amino acids

A ala alanine M met methionineC cys cysteine N asn aspargineD asp aspartic acid P pro prolineE glu glutamic acid Q gln glutamineF phe phenylalanine R arg arginineG gly glycine S ser serineH his histidine T thr threonineI ile isoleucine V val valineK lys lysine W trp tryptophaneL leu leucine Y tyr tyrosine

Secondary StructureSecondary structure is usually divided into

three categories:

Alpha helix Beta strand (sheet)Anything else –

turn/loop

3.6 residues

5.6 Å

Alpha HelixAlpha Helix: : Pauling Pauling ((19511951))

• A consecutive stretch of 5-40 amino

acids (average 10).

• A right-handed spiral conformation.

• 3.6 amino acids per turn.

• Stabilized by H-bonds in the backbone between C=O of residue n, and NH of residue n+4.

• Side-chains point out.

Beta StrandBeta Strand: : Pauling and Corey Pauling and Corey ((19511951))

• Different polypeptide chains run alongside each

other and are linked together by hydrogen bonds.

• Each section is called β -strand,

and consists of 5-10 amino acids.

β -strand

The strands become adjacent to each other, forming beta-sheet.

Beta SheetBeta Sheet3.47Å

3.25Å

(a)Antiparallel(b)Parallel

LoopsLoops• Connect the secondary structure

elements.

• Have various length and shapes.

• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.

• Proteins that are evolutionary related have the same helices & sheets but may vary in loop structures.

How is the 3D Structure Determined ?How is the 3D Structure Determined ?

1. Experimental methods (Best approach):1. Experimental methods (Best approach):• X-rays crystallography.

• NMR.

• Others.

2. In-silico methods (partial solutions - 2. In-silico methods (partial solutions -

based on similarity):based on similarity):.• Threading - needs a 3D structure, combinatorial complexity.

• Ab-initio structure prediction - not always successful.

X-ray crystallography1. Obtain an ordered protein crystal.

2. Check x-ray diffraction.

The crystal is bombarded The crystal is bombarded with X-ray beams.with X-ray beams.

The collision of the beams The collision of the beams with the electrons creates with the electrons creates a diffraction pattern.a diffraction pattern.

X-ray crystallography3. Analyze diffraction pattern and produce an

electron density map.

4. Thread the known protein sequence into the density map.

X-ray crystallography

• The molecules must be very pure in order to produce perfect and stable crystals.

• The method is time-consuming and difficult.

NMR - Nuclear MagneticResonance (since 1945)

• A sample is immersed in a magnetic field and bombarded with radio waves.

• The molecule’s nucleus resonate (spin). This motion is determined and is specific for each molecule type.

Principles of NMR

NMR - Nuclear MagneticResonance

• The NMR technique is very time consuming and expensive, and the sample has to be in a concentrated solution, and is limited to small and soluble molecules.

PDB: Protein Data Bank• Holds 3D models of biological macromolecules (protein,

RNA, DNA).

• All data are available to the public.

• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).

• Submitted by biologists and biochemists from around the world.

PDB – Protein Data Bank

http://www.rcsb.org/pdb/

How Many Structures ?How Many Structures ?PDB Content Growth

http://www.rcsb.org/pdb/holdings.html

Structure Prediction: Motivation

• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)

• Only about 28000 solved structures (PDB)Experimental methods are time consuming and not always posible

• Goal: Predict protein structure based on sequence information

Structure Prediction: Motivation

• Understand protein function– Locate binding sites

• Broaden homology– Detect similar function where sequence differs

• Explain disease– See effect of amino acid changes– Design suitable compensatory drugs

Prediction Approaches

• Primary (sequence) to secondary structure– Sequence characteristics

• Secondary to tertiary structure– Fold recognition– Threading against known structures

• Primary to tertiary structure– Ab initio modelling

Secondary structures have an amphiphilic nature :one face polar and the other non polar

Non-polarpolar

-helix -sheet

non-polar

Can we predict the secondary structure from sequence ?

Secondary Structure Prediction Methods

• Chou-Fasman / GOR Method– Based on amino acid frequencies

• Artificial Neural Network (ANN) methods– PHDsec and PSIpred

• HMM (Hidden Markov Model)

• Best accuracy now ~80%

Chou and Fasman (1974)Name P(a) P(b) P(turn)

Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50

The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)

Success rate of 50%

Secondary Structure Method Improvements

‘Sliding window’ approach• Most alpha helices are ~12 residues long

Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold

predict this is an alpha helix/beta sheet

TGTAGPOLKCHIQWMLPLKKTGTAGPOLKCHIQWMLPLKK

Improvements in the 1980’s

• Adding information from conservation in MSA

• Smarter algorithms (e.g. HMM, neural networks).

Success -> ~80%

PHDsec and PSIpred

• PHDsec– Rost & Sander, 1993– Based on sequence family alignments

• PSIpred– Jones, 1999– Based on Position Specific Scoring Matrix Generated by PSI-BLAST

• Both consider long-range interactions

• HMM enables us to calculate the probability of assigning a sequence of hidden states to the observation

TGTAGPOLKCHIQWML TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBBHHHHHHHLLLLBBBBB

observation

Hidden state

The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15

The probability of observing

Alanine as part of a β-sheet

Table built according to large database of known secondary structures

α-helix followed by

α-helix

Beginning with an α-

• The above table enables us to calculate the probability of assigning secondary structure to a protein

• ExampleTGQTGQHHHHHH

p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

SS prediction using ANN

ACDEFGHI

MNPQRSTVWY.

Inputs for one positionAmino

acid at position

PHDsec Neural Net

ACDEFGHI

MNPQRSTVWY.

Inputs for one positionAmino

acid at position

Hidden layer

OutputsH= helixE= strandC= CoilConfidence 0=low,9=high

Secondary structure prediction

• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University

of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,

EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks

at University of California • DLP - Domain linker prediction at RIKEN

Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures ...

Documents

ValidatorDB: Search by PDB

AK Cold Rolled PDB 201406l

Debugging In Python · Debugging Tools pdb :: python debugger pdb++ :: pdb + new features:: tab completion, syntax highlighting, sticky mode ipdb :: pdb + Ipython capabilities pudb

(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt

LG Lighting PDB

Protein Classification. PDB Growth New PDB structures

PDB Kennedy and Johnson Public16Sep2015

Pdb Cloudfs Snapclone 2212051

PDB and PyMOL

Interconnection Evolution · 2020. 10. 7. · FranceIX Parigi 258AS(PdB) 595Gbps 27KrotteRS SIX Seattle 241AS(PdB) 550Gbps 57KrotteRS MIX Milano 135AS(PdB) 378Gbps 33KrotteRS TOPIX

PDB Spring 2013 - uky.edu

Gross Domestic Product (PDB)

boc v pdb ncba

1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline NCBI and Entrez Pubmed Google scholar RefSeq Swissprot Fasta format PDB: Protein

PDB ain’t PDD: Let’s introduce program database lesdownload.tuxfamily.org/overclokblog/PDB ain't PDD/0vercl0k_pdb_aint...PDB ain’t PDD: Let’s introduce program database les

Chemistry & the PDB MSDchem

PyMOL Tutorial Notes What is a PDB file?web.stanford.edu/class/cs279/lectures/pymoltutorial.pdfPyMOL Tutorial Notes What is a PDB file? Go through the Ubiquitin pdb file: 1UBQ.pdb

Couplings - pdb-media.leinelinde.se

Beginning biomolecular structure analysis with Bio3D: PDB ...thegrantlab.org/bio3d_v2/vignettes/Bio3D_pdb.pdf · pdb

Floor Standing PDB