Computational Analysis of Proteins

Preview:

DESCRIPTION

Dr. K. Sivakumar Department of Chemistry SCSVMV University chemshiva@gmail.com. Computational Analysis of Proteins. National Workshop on Modern Techniques in Analytical Chemistry. Chemistry – Our Life, Our Future. www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html. Amino Acid. - PowerPoint PPT Presentation

Citation preview

Computational Analysis of Proteins

Dr. K. SivakumarDepartment of Chemistry

SCSVMV Universitychemshiva@gmail.com

Chemistry – Our Life, Our Future

National Workshop

on

Modern Techniques in Analytical Chemistry

www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html

AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS

Triple & single letter codes of amino acids

General structureof an amino acid

Amino AcidTriple letter

codeSingle letter

codeAlanine Ala A

Cysteine Cys C

Aspartic acid Asp D

Glutamic acid Glu E

Phenylalanine Phe F

Glycine Gly G

Histidine His H

Isoleucine Ile I

Lysine Lys K

Leucine Leu L

Methionine Met M

Asparagine Asn N

Proline Pro P

Glutamine Gln Q

Arginine Arg R

Serine Ser S

Threonine Thr T

Valine Val V

Tryptophan Trp W

Tyrosine Tyr Y2

PROTEIN SEQUENCING ( Order of amino acids in proteins)

MALSFTVGQLIFLFWTMRITEASPD

Methionine

AlanineLeucine Serine

Phenylalanine

Protein sequence

Protein sequencer•Protein sequencing - determining the order of amino acid sequence

•Methods– Mass Spec., Edman degradation,….

•Amino acids in a protein - determines the properties of proteins

•Proteins are sequenced - by microbiologists and biotechnologists for various purposes.

3

4

www.writersujatha.com

Refer “GENOME” by Sujatha, for simple explanations on sequencing process

5

Various levels of protein structure…….. Various levels of protein structure……..

Methane

Primary structure

Secondarystructure

Tertiarystructure

Protein

Primary structure

Secondarystructure

Tertiarystructure

4CH

M for MetheonineM for group of atoms

C for carbonC for single atom

• Protein sequences are continuously submitted by sequencing centers and updated in protein databases.

• Till date more than 10 Lac proteins are sequenced and publicly made available through protein databases. For example,

524,420

Protein Sequence Databases No. of Sequences

1,365,912

13,593,921

7

Sequence growth in Protein sequence databases:

Ref: SwissProt – Feb’ 2011 Ref: GenomeNet – Feb’ 2011

70,947Till 01, Feb, 2011

9

524,420 - ~ 5 Lac

Protein Sequence Databases No. of Sequences

1,365,912 - > 10 Lac

13,593,921 - ~ 1 Cr

The ONLY Protein Structure Database No. of Structure

Ref: K. Sivakumar, Advanced BioTech, V (9), 20-27 (2007)

10

PDB contains (70,947) structures determined by X-ray, NMR & Electron microscopyPDB contains (70,947) structures determined by X-ray, NMR & Electron microscopy

EM~350

NMR~8,700

X-ray~60,500

Most of the sequenced proteins lack a descriptive, documented physico-chemical and STRUCTURAL characterization.

Because, experimental methods (X-ray, NMR, EM) are,

Trial and error based

Time consuming

Expensive

11

Computational methods are,

Minimizing the number of experimental trials.

Reduces the cost of experimental investigation.

Facilitates experimental analysis be more focused.

Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Theoretical and Computational Chemistry, 6 (1), 127-140 (2007).

12

Need for computational analysisNeed for computational analysis

• > 10 Lac sequences are available in public databases

• Sequences are highly valuable resources, because…

• Huge amount of structural, functional & evolutionary information are locked up in sequences

• By contrast, the # of unique protein structures is very less

• - this represents a huge information deficit

• So, We need to construct 3D Models by COMPUTATIONAL METHODS

13

3D Structure can be modelled by…3D Structure can be modelled by…

• Homology Modeling

• Threading

• Ab initio

Ref: K. Sivakumar, Advanced BioTech, IV (11), 18-23 (2006)

Repeated with other suitable templates

14

Homology Modeling – Principle…Homology Modeling – Principle…

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Predicting Protein Structure:Predicting Protein Structure:Comparative ModelingComparative Modeling

(formerly, homology modeling)(formerly, homology modeling)

Use as template & model 8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare

Similar Sequence

HomologousTarget sequence Template sequence

Template structure

What is Homology Modeling?

• Predicts the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)

• If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed.

• In general, 30% sequence identity is required for generating useful models.

17

Homology ModelingHomology ModelingGet protein sequence from sequence database

http://expasy.org/sprot/

18

Click to get protein details

19

Click to get protein sequence

20

protein sequence in fasta format

Save it in a notepad for further use

21

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins

Using Protein Blast server to find similar STRUCTURE

Click to search, similar structures in PDB

Paste sequence in Fasta format

Choose PDB

22

Graphical summary of Blastp suite

Blast search of O70456 Vs PDB

23

List of similar structure - Blastp suite

24

Detailed summary of Blastp suite

25

Paste sequence

only

Type the PDB ID

Method1: EsyPred3D server - Submit the sequence and PDB ID

Click to submit

26

Get built in structure through email in Inbox

27

Download the attached the *.pdb file and save it

28

Open and visualize the *.pdb file in RasMol

29

Open and visualize the *.pdb file in RasMol

30

Method2: SWISS-MODEL server

Click for modeling

31

Submit sequence only in Fasta format (without PDB ID)

Similarity search (BlastP) will be done by SWISS-MODEL server

Paste sequenceClick to submit

32

Get built in structure through email in Inbox

33

The links in the email will lead to

Click to download 3D structure

34

Open and visualize the *.pdb file in RasMol

35

Structure retrieval from Protein 3D Structure Database – PDB……….

36

Structure retrieval from Protein 3D Structure Database – PDB……….

PDB ID

Click for protein details

491 sequence in SwissProt for « Keratin »

37

Structure retrieval from Protein 3D Structure Database – PDB……….

Click for downloading structure

38

Structure retrieval from Protein 3D Structure Database – PDB……….

Save & Know the location

39

Open and visualize the *.pdb file in RasMol

Structure of 3EUU

40

MNRVDLSLFIPDSLTAETGDLKIKTYKVVLIARAASIFGVKRIVIYHDDADGEARFIRDILTYMDTPQYLRRKVFPIMRELKHVGILPPLRTPHHPTG

Sequence data

Structural data

(in notepad)

Atom No. AtomAmino

Acid(AA)AA No.

1 N PRO 98 8.824 17.273 88.787

2 CA PRO 98 8.452 18.679 89.088

3 CD PRO 98 9.692 16.763 89.899

4 CB PRO 98 8.73 18.889 90.578

5 CG PRO 98 9.172 17.521 91.124

6 C PRO 98 9.482 19.367 88.271

7 O PRO 98 10.515 19.739 88.825

8 N ARG 99 9.263 19.522 86.956

9 CA ARG 99 10.346 20.102 86.231

10 CB ARG 99 11.564 19.174 86.276

11 CG ARG 99 12.054 19.012 87.718

12 CD ARG 99 10.944 18.606 88.698

6213 N GLY 1078 -299.78 40.023 17.009

6214 CA GLY 1078 -285.59 39.377 19.813

6215 C GLY 1078 -267.82 38.403 22.744

6216 O GLY 1078 -267.78 38.205 24.03

6217 N ILE 1079 -255.59 37.727 24.695

6218 CA ILE 1079 -241.59 37.013 27.144

6219 CB ILE 1079 -241.06 35.864 27.728

-------------------------------------------------------------------------------

x, y,z Cordinates

Structural data

(in RasMol)

41

Built model validation by ProQ server

Click for uploading structure

42

Built model validation by ProQ server

Click & upload the structure

43

Built model validation by ProQ server

Submit after uploading

44

Built model validation by ProQ server result

45

Built model validation by Ramachandran Plot

Click & upload the structure

46

Submit after uploading

Built model validation by Ramachandran Plot….

47

Built model validation by Ramachandran Plot…. RESULTS

G.N.Ramachandran

Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Chemical Sciences, 119 (5), 571-579 (2007)

3D structure modeling and validation

48

Disulphide bridges in 3D structure of Q01758

• Backbone of Q01758 (rainbow smelt fish)• 10 Cysteines - ball and stick • 10 Sulphur in Cysteines and 5 SS bonds (dotted lines) 49

Disulphide bridges in 3D structure of P05140

• Ribbon model of P05140 (sea raven)• 10 Cysteines - ball and stick• 10 Sulphur in Cysteines and 5 SS bonds (dotted lines)

50

Secondary structure prediction from modeled 3D structure

Q01758

P05140 Beta strand

-helices

Coil

51

52

Finding cavities in the built model using Castp server

Click for calculation

53

Finding cavities in the built model using Castp server

Click, upload & Submit the structure

54

Finding cavities in the built model using Castp server - RESULTS

55

For literature

56

58

59

• Download sequence file for any one of the following proteins from Swissprot/Protein Information Resource/Protein Research Foundation,

AntifreezeVascular Endothelial growth factor proteinKeratin

• Generate atleast 3 homology models using EsyPred server or SWISS-model server (i.e., using different PDB structures)

• Visualize the structure using RasMol tool

• Compare and Evaluate the modelled 3D structure using RamPage, ProQ Server and Combinatorial Extension servers.

EXERCISE

Target sequence code Template (PDB) Codes

RamPage ProQ

Percentage of residues in

favoured regionLG Score MaxSub

60

• Generate the report in MS-Word file and submit to chemscsvmv@gmail.com

• Repeat the exercise for other protein sequences of your choice

EXERCISE……

Thank you all!

61

P05140

Recommended