Bioinformatiha 2, Firenze 18 ottobre -15 Different protein folds require different amino acid composition of their cores Davide Alocci, Andrea Bernini,

Embed Size (px)

Citation preview

  • Slide 1

Bioinformatiha 2, Firenze 18 ottobre -15 Different protein folds require different amino acid composition of their cores Davide Alocci, Andrea Bernini, Pasquale Lista, Andrea Santarelli Ottavia Spiga, Edoardo Morandi and Neri Niccolai Department of Biotechnology, Chemistry and Pharmacy University of Siena Slide 2 Bioinformatiha 2, Firenze 18 ottobre -14 Earth Coring objects to discover their origin = 6,371 Slide 3 Bioinformatiha 2, Firenze 18 ottobre -14 proteins Earth Coring objects to discover their origin Slide 4 Atom depth analysis Bioinformatiha 2, Firenze 18 ottobre -13 proteins * a rather new descriptor for complex molecular strutures Atom depth* = atom distance from molecular surface. [Chakravarty,S. and Varadarajan,R. (1999) Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold. Des., 7, 723732] Coring objects to discover their origin Slide 5 Bioinformatiha 2, Firenze 18 ottobre -13 proteins S imple A tom D epth I ndex C alculator 3D atom depth analysis Coring objects to discover their origin Slide 6 Bioinformatiha 2, Firenze 18 ottobre -12 proteins 3D atom depth analysis Coring objects to discover their origin Slide 7 Bioinformatiha 2, Firenze 18 ottobre -12 Depht index defined as: exposed volume sphere volume 3D atom depth analysis Coring objects to discover their origin A0A0 r Slide 8 3D atom depth analysis Bioinformatiha 2, Firenze 18 ottobre -11 from PDB ID 1UBQ http://www.sbl.unisi.it/prococoa/ DiDi Slide 9 N 0.19 CA 0.30 C 0.25 O 0.23 CB 0.50 CG 0.68 CD 0.91 CE 1.11 NZ 1.29 K63 N 0.38 CA 0.52 C 0.50 O 0.52 CB 0.76 CG 0.95 CD 1.17 OE1 1.24 OE2 1.24 E24 3D atom depth analysis N 0.10 CA 0.05 C 0.11 O 0.18 CB 0.02 CG 0.02 CD1 0.02 CD2 0.00 L43 D imax Bioinformatiha 2, Firenze 18 ottobre -11 from PDB ID 1UBQ http://www.sbl.unisi.it/prococoa/ Slide 10 D imax analysis of protein singles Bioinformatiha 2, Firenze 18 ottobre -10 defining strutural layers in protein 3D structures each strutural layer includes atoms with similar D i s fast and accurate analysis of aa content of structural layers Slide 11 D imax analysis of protein singles quite a few proteins like to stay single (at least in the crystalline state) Bioinformatiha 2, Firenze 18 ottobre -9 Slide 12 D imax analysis of protein singles Bioinformatiha 2, Firenze 18 ottobre -9 quite a few proteins like to stay single (at least in the crystalline state) Slide 13 a database of protein singles Experimental Method: X-RAY (79,770) Chain Type: Protein (74,456) Only 1 chain in asym. unit: (28,803) Oligomeric state: 1 (21,193) Number of Entities: 1 (3,517) Homologue Removal @ 95% identity (2,410) 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues DOOPS: Bioinformatiha 2, Firenze 18 ottobre -8 Slide 14 a database of protein singles 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues DOOPS: Swiss-Prot: 540,958 proteins in the dataset (192 Maa) 0 2000 1000 Bioinformatiha 2, Firenze 18 ottobre -8 Slide 15 D i analysis of protein singles 3 VTR (chitinolytic enzyme 572 aa) Bioinformatiha 2, Firenze 18 ottobre -7 Slide 16 calculation of % amino acid content in L 0 the first quantitative analysis of a large array of protein cores! D i analysis of protein cores 2,410 proteins; 4,657,574 atoms; 589,383 residues DOOPS: ~20 % of total molecular volume DOOPS aa(L 0 ) = 106,088 Bioinformatiha 2, Firenze 18 ottobre -6 core aa if D imax < 0.2 Slide 17 calculation of % amino acid content in L 0 the first quantitative analysis of a large array of protein cores! D i analysis of protein cores 2,410 proteins; 4,657,574 atoms; 589,383 residues DOOPS: ~20 % of total molecular volume DOOPS aa(L 0 ) = 106,088 Bioinformatiha 2, Firenze 18 ottobre -6 Slide 18 Bioinformatiha 2, Firenze 18 ottobre -5 ClassArchitecturesTopology Homologous superfamily Domains 1 (mainly ) 538687537,038 2 (mainly ) 2022952043,881 3 ( & ) 14594111390,029 4 (few sec. str.) 11041182,588 Total 4013132626173,536 D i analysis of protein cores folding clues from aa core composition? : Slide 19 1.101.201.251.502.102.302.402.602.803.103.203.303.403.603.90total Proteins mono 213 (84) 84 (40) 19 (17) 10 (3) 17 (13) 57 (37) 94 (73) 134 (110) 12 (12) 84 (73) 52 (44) 139 (106) 218 203 10 (8) 49 (49) 1,190 (872) ( ) Bioinformatiha 2, Firenze 18 ottobre -4 D i analysis of protein cores folding clues from aa core composition? # domain DOOPS + CATH selected Architectures with 10 PDB files : Slide 20 Bioinformatiha 2, Firenze 18 ottobre -3 Cys PDB ID 1UZK(A01) aa % average value (av) av + av + 2 av - av - 2 Towards protein folding barcodes ribbon Leu Phe PDB ID 1RG8(A00) trefoil Val PDB ID 2IMH(A01) four layer sandwich ClassArchitecturesTopology Homologous superfamily 15386875 220229520 3145941113 41104118 Total 4013132626 % L 0 1.101.201.251.502.102.302.402.602.803.103.203.303.403.603.90 overall ALA 13,2810,3221,4612,749,2610,058,439,325,510,6910,0812,5811,8814,9512,01 11.51 ARG 0,61,280,241,3900,641,720,7500,551,111,750,30,470,95 0.83 ASN 0,672,620,732,771,852,041,771,3602,12,90,961,522,82,1 1.70 ASP 1,612,620,242,911,231,272,031,7902,12,93,021,772,340,95 1.77 CYS 3,352,995,370,8322,842,041,464,420,922,832,11,491,861,43,05 2.63 GLN 0,61,50,241,111,231,151,811,6900,461,562,150,991,41,33 1.21 GLU 1,481,440,731,5201,151,191,0400,912,592,411,080,930,67 1.20 GLY 8,058,729,7613,8516,059,9216,210,829,178,7811,8111,3512,6413,089,91 10.81 HIS 1,011,62,441,110,620,760,790,5602,651,963,021,910,472,48 1.32 ILE 12,689,9510,738,596,7913,6110,6810,7813,7612,811,7712,5311,537,0111,34 11.74 LEU 23,8818,3422,4411,778,0217,1812,9713,9833,9416,5411,914,3314,2215,4213,63 16.27 LYS 0,670,9101,1100,380,490,5600,090,621,360,5500,67 0.58 MET 2,624,171,714,9902,82,653,151,832,932,762,412,393,271,91 2.49 PHE 6,446,792,934,574,327,127,066,7315,67,224,956,186,074,216,01 6.36 PRO 1,342,463,412,633,093,3132,7803,292,91,842,251,41,81 2.45 SER 3,494,553,665,963,095,345,565,132,752,835,354,434,236,075,34 4.85 THR 2,284,814,157,25,563,315,124,470,923,25,224,254,945,145,91 4.65 TRP 1,011,5502,773,70,381,632,782,752,191,520,661,260,472,1 1.43 TYR 2,623,690,244,572,471,272,694,380,923,293,121,582,3202,29 2.50 VAL 12,349,689,517,629,8816,2812,7513,5111,9314,5312,8811,716,2919,1615,54 13.7 # PDB 213 (84) 84 (40) 19 (17) 10 (3) 17 (13) 57 (37) 94 (73) 134 (110) 12 (12) 84 (73) 52 (44) 139 (106) 218 203 10 (8) 49 (49) 2,410 D i of 173,536 CATH domains 28 h, 5 (average comp. time 1.72 s/domain) Calculations performed on 6 cores 990X CPU based computer Ala PDB ID 3CKC(A02) alpha horseshoe CATH-ADAPT CATHa da pt CATH - atom d epth a ssisted protein tomography Slide 21 Bioinformatiha 2, Firenze 18 ottobre -2 1.10.1840.10 198-306 2.40.10.10 1-197 CATH PDB ID: 2ZU4 Dual cores for faster folding? Slide 22 Databanks + New tools = New insights ? CONCLUSIONS S imple A tom D epth I ndex C alculator protein fold barcoding CATH ADAPT Bioinformatiha 2, Firenze 18 ottobre Slide 23 Bioinformatiha 2, Firenze 18 ottobre 0!! Aknowledgements DavideAndreaEdoardoAndreaOttavia AlocciBerniniMorandiSantarelliSpiga Slide 24 Livello Laurea Triennale Scienze Biologiche Bioinformatica, docente Claudia Landi Biotecnologie Cenni di Bioinformatica nei Corsi di Biochimica, Biologia Molecolare e Genetica Livello Dottorale Scuola di Dottorato in INGEGNERIA E SCIENZA DELL'INFORMAZIONE Curriculum in Bioinformatics Livello Laurea Magistrale Scienze Chimiche Chimica delle proteine Genomica strutturale, docente Neri Niccolai Biologia Molecolare e Cellulare Modellistica 3D di componenti cellulari INGEGNERIA INFORMATICA E DELLAUTOMAZIONE Bioinformatics, docente Monica Bianchini, Computer and Automation Engineering Models and languages for Bioinformatics, docente Moreno Falaschi in corso di realizzazione: un curriculum di Bioinformatics nella Magistrale di Computer and Automation Engineering internazionalizzato con le Universita' di Leiden e Delft (Olanda) Docenza su temi di Bioinformatica presso UNISI