12
pg 5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure). Coming Soon !!! The next lecture will review step 4 and cover this as well as the actual docking (step 5). 23 pg 5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure). In our case we are just reconstructing a single protein, but have only found structural templates for two separate portions of the whole. However there are other very important cases of multi-protein complexes where separate protein molecules must be correctly assembled into a complex in order to support function. 5a- Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces. 5b- Exposed hydrophobes ; burial of these within a binding interface can provide driving force for complex formation. 5c- Patches of conserved charge may attract and bind to conserved opposite charge on the partner protein. Special in our case: We know that the two structural units must be close enough in 3D space to be connected by a stretch of 19 amino acids. The domain we have done together in class covered residues 1-550. The other domain with good homology covers residues 569-643. The 19 residues between models needs to be able to stretch from the C-terminus of the large domain and the N- terminal of the C-terminus of the small domain. (There is a feeble structural model for residues 524-643. This could be used with extreme caution to bridge the two decent models. We will not attempt that in this class.) (Numbers are drawn from page 16, SwissModel results) 24

Coming Soon !!! The next lecture will review step 4 and … 5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits

  • Upload
    vodang

  • View
    217

  • Download
    5

Embed Size (px)

Citation preview

pg

5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure).

Coming Soon !!!The next lecture will review step 4 and cover this as well as the actual docking (step 5).

23

pg

5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure). In our case we are just reconstructing a single protein, but have only found structural templates for two separate portions of the whole. However there are other very important cases of multi-protein complexes where separate protein molecules must be correctly assembled into a complex in order to support function.

5a- Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces.5b- Exposed hydrophobes; burial of these within a binding interface can provide driving force for complex formation.5c- Patches of conserved charge may attract and bind to conserved opposite charge on the partner protein.

Special in our case: We know that the two structural units must be close enough in 3D space to be connected by a stretch of 19 amino acids. The domain we have done together in class covered residues 1-550. The other domain with good homology covers residues 569-643. The 19 residues between models needs to be able to stretch from the C-terminus of the large domain and the N-terminal of the C-terminus of the small domain.(There is a feeble structural model for residues 524-643. This could be used with extreme caution to bridge the two decent models. We will not attempt that in this class.)(Numbers are drawn from page 16, SwissModel results)

24

pg

5a- Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces.We will perform a Multiple Sequence AlignmentThere is a variety of on-line tools for this. We will use the well-known ‘Clustal-Omega’. http://www.ebi.ac.uk/Tools/msa/clustalo/

Our sequences go here.The CRUCIAL decision is which sequences to use: We want to identify amino acids conserved among ‘bifurcating’ hydrogenases. These may not be conserved among hydrogenases in general. Therefore we need to select ONLY sequences of hydrogenases that bifurcate.

25

pg

5a- Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces.Sequences for our Multiple Sequence Alignment should be those of bifurcating hydrogenases:Tm1426_NP_229226.1 = GI:15644177 this is our Thermotoga maritima protein, our unknownClo1313_1881 = YP_005688384.1 this is the homologous protein from Clostridium pasteurianumCthe_0342 = YP_001036773_HydA this is the homologous protein from Clostridium thermocellumClo1313_1791_YP_005688298_HydA this is another homologous protein from Clostridium pasteurianumYP_001956466 this is a NAD-depFe-hydrogenase from a Termite gut bacterium.

I have posted these sequences on our web site as a plain text file HydAsequencesCulled.txt .You would generate an equivalent file by copying the amino acid sequences in FASTA format from each of the proteins into a text file with the heading line for each sequence beginning with ‘>’. Do not include the symbol ‘:’ anywhere in the file.

To run Clustal, select the entire text content, copy and paste into the clustal box. Alternately upload the file (did not work for me). This is ‘step 1’.

In ‘step 2’ choose ‘Clustal w numbers’ (the numbers are very helpful).

Chose whether or not to be notified by email (useful, because the message will include your job number and allow you to access it later (but not forever!).

Click ‘Submit’.

26

pg

>Tm1426_NP_229226.1_GI:15644177_Hyd-alpha 1 mkiyvdgrev iindnernll ealknvgiei pnlcylseas iygacrmclv eingqittsc 61 tlkpyegmkv ktntpeiyem rrnilelila thnrdcttcd rngscklqky aedfgirkir 121 fealkkehvr desapvvrdt skcilcgdcv rvceeiqgvg viefakrgfe svvttafdtp 181 lietecvlcg qcvaycptga lsirndidkl iealesdkiv igmiapavra aiqeefgide 241 dvamaeklvs flktigfdkv fdvsfgadlv ayeeahefye rlkkgerlpq ftsccpawvk 301 haehtypqyl qnlssvkspq qalgtvikki yarklgvpee kiflvsfmpc takkfeaere 361 ehegivdivl ttrelaqlik msridinrve pqpfdrpygv ssqaglgfgk aggvfscvls 421 vlneeigiek vdvkspedgi rvaevtlkdg tsfkgaviyg lgkvkkflee rkdveiievm 481 acnygcvggg gqpypndsri rehrakvlrd tmgikslltp venlflmkly eedlkdehtr 541 heilhttyrp rrrypekdve ilpvpngekr tvkvclgtsc ytkgsyeilk klvdyvkend 601 megkievlgt fcvencgasp nvivddkiig gatfekvlee lskng >Clo1313_1881_YP_005688384.1_NC_017304.1_HydA 1 mqmvnvtidn ckiqvpanyt vleaakqani diptlcflkd inevgacrmc vvevkgarsl 61 qaacvypvse glevytqtpa vrearkvtle lilsnhekkc ltcvrsence lqrlakdlnv 121 kdirfegems nlpiddlsps vvrdpnkcvl crrcvsmckn vqtvgaidvt ergfrttvst 181 afnkplsevp cvncgqcinv cpvgalrekd didkvweala npelhvvvqt apavrvalge 241 efgmpigsrv tgkmvaalsr lgfkkvfdtd taadltimee gtelinrikn ggklplitsc 301 spgwikfceh nypefldnls scksphemfg avlksyyaqk ngidpskvfv vsimpctakk 361 feaqrpelss tgypdvdvvl ttrelarmik eagidfnslp dkqfddpmge asgagvifga 421 tggvmeaair tvgellsgkp adkieytevr gldgikeasi eldgftlkaa vahglgnark 481 lldkikagea dyhfieimac pggcingggq piqpssvrnw kdirceraka iyeedeslpi 541 rkshenpkik mlyeeffgep gshkahellh thyekrenyp vk >Cthe_0342_YP_001036773_HydA 1 mqmvnvtidn ckiqvpanyt vleaakqani diptlcflkd inevgacrmc vvevkgarsl 61 qaacvypvse glevytqtpa vrearkvtle lilsnhekkc ltcvrsence lqrlakdlnv 121 kdirfegems nlpiddlsps vvrdpnkcvl crrcvsmckn vqtvgaidvt ergfrttvst 181 afnkplsevp cvncgqcinv cpvgalrekd didkvweala npelhvvvqt apavrvalge 241 efgmpigsrv tgkmvaalsr lgfkkvfdtd taadltimee gtelinrikn ggklplitsc 301 spgwikfceh nypefldnls scksphemfg avlksyyaqk ngidpskvfv vsimpctakk 361 feaqrpelss tgypdvdvvl ttrelarmik etgidfnslp dkqfddpmge asgagvifga 421 tggvmeaair tvgellsgkp adkieytevr gldgikeasi eldgftlkaa vahglgnark 481 lldkikagea dyhfieimac pggcingggq piqpssvrnw kdirceraka iyeedeslpi 541 rkshenpkik mlyeeffgep gshkahellh thyekrenyp vk >Clo1313_1791_YP_005688298_HydA 1 mdnreymlid gipveingek nllelirkag iklptfcyhs elsvygacrm cmvenewggl 61 daacstppra gmsiktnter lqkyrkmile lllanhcrdc ttcnnngkck lqdlamryni 121 shirfpntas npdvddsslc itrdrskcil cgdcvrvcne vqnvgaidfa yrgskmtist 181 vfdkpifesn cvgcgqcala cptgaivvkd dtqkvwkeiy dkntrvsvqi apavrvalgk 241 elglndgena igkivaalrr mgfddifdts tgadltvlee saellrrire gkndmplfts 301 ccpawvnyce kfypellphv stcrspmqmf asiikeeyst sskrlvhvav mpctakkfea 361 arkefkvngv pnvdyvlttq elvrmikesg ivfselepea idmpfgtytg agvifgvsgg 421 vteavlrrvv sdksptsfrs laytgvrgmn gvkeasvmyg drklkvavvs glknagdlie 481 rikagehydl vevmacpggc ingggqpfvq seerekrgkg lysadklcni ksseenplmm 541 tlykgilkgr vhellhvdya skkeak >Cthe_0430_YP_001036861.1_HydA-Identical_to_Clo1313-1791 1 mdnreymlid gipveingek nllelirkag iklptfcyhs elsvygacrm cmvenewggl 61 daacstppra gmsiktnter lqkyrkmile lllanhcrdc ttcnnngkck lqdlamryni 121 shirfpntas npdvddsslc itrdrskcil cgdcvrvcne vqnvgaidfa yrgskmtist 181 vfdkpifesn cvgcgqcala cptgaivvkd dtqkvwkeiy dkntrvsvqi apavrvalgk 241 elglndgena igkivaalrr mgfddifdts tgadltvlee saellrrire gkndmplfts 301 ccpawvnyce kfypellphv stcrspmqmf asiikeeyst sskrlvhvav mpctakkfea 361 arkefkvngv pnvdyvlttq elvrmikesg ivfselepea idmpfgtytg agvifgvsgg 421 vteavlrrvv sdksptsfrs laytgvrgmn gvkeasvmyg drklkvavvs glknagdlie 481 rikagehydl vevmacpggc ingggqpfvq seerekrgkg lysadklcni ksseenplmm 541 tlykgilkgr vhellhvdya skkeak

Use the accession name of the protein in the /protein/ data base of NCBI to get the amino acid sequence directly. It is way down at the bottom of the page

NP_229226.1

Begin a txt file to house all the aa sequences.(No need for step 1.)

Then use BLAST to find homologues. Do this directly from the Entrez/protein page.

This sequence turned out to be identical to another I already had. Therefore it was dropped. You need a minimum of 5 sequences for later steps. Therefore I did a BLAST search and identified the termite gut bacterial sequence as a bifurcating hydrogenase.

Aside: accumulation of a txt file containing protein amino acid sequences.

27

pg

RES

ULT

S: M

SA o

f 5 b

ifurc

atin

g H

ydA

s

28

pg

MSA of 5 bifurcating HydAs The hydrogenase from a bacterium inhabiting termite gut is described as being NAD-dependent so I think it is a Bif H2ase.This sequence has a C-terminal NuoE-like domain, similar to our target Tm1426, so it could be a good model.

4 Cys ligands of a FeS cluster

The alignments are fun to look at but we need to see degree of conservation mapped onto our model structure to know which conserved amino acids are on the surface and not near active sites. To do this we will use the Consurf tool. However we will want to use the MSA that Clustal made for us.Download your MAS from the clustal site that presents your results. Chose an informative title but retain the .clustal_num file type. 29

pg

5a- Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces.Displaying the conservation patterns of the MSA as colour codes on the structural model. Go to http://consurf.tau.ac.il/

Choose Amino acids‘Yes’ there is a structure ( = our model, saved from SwissModel, page 17)Upload the pdb file you saved from SwissModel (page 17).modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.pdbClick ‘Next’The Chain identifier can be selected among options that Consurf has already found within your pdb file. In our case there is only one option: ‘A’.Do we have an MSA? YES (that is what we used the Clustal-Omega for.)Upload the MSA you just saved.modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.clustal_numI called my analysis “HydA-LargeDomain”NO we do not have a phylogenetic tree (5 sequences is nowhere near a sufficient number to do that.)I tend to give the job a title “HydA-LargeDomain” and give my email address because this will provide a record and access to the analysis, for a week.Click Submit.

30

pg

5- a Residues that are highly conserved yet outside the active site are strong candidates for binding interfaces.Displaying the conservation patterns of the MSA as colour codes on the structural model. Go to http://consurf.tau.ac.il/

Choose Amino acids‘Yes’ there is a structure ( = our model, saved from SwissModel, page 17)Upload the pdb file you saved from SwissModel (page 17).modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.pdbClick ‘Next’The Chain identifier can be selected among options that Consurf has already found within your pdb file. In our case there is only one option: ‘A’.Do we have an MSA? YES (that is what we used the Clustal-Omega for.)Upload the MSA you just saved.clustalo-HydAwTermite_21Jan2014-oy.clustal_num.clustal_numProvide the query sequence name. This has to exactly match a sequence in the MSA and have the same amino acid sequence as in the pdb. I used “Tm1426-NP_229226-GI15644177_Hyd-alpha”NO we do not have a phylogenetic tree (5 sequences is nowhere near a sufficient number to do that.)I tend to give the job a title “HydA-LargeDomain” and give my email address because this will provide a record and access to the analysis, for a week.Click Submit.

31

pg

5- a Displaying the conservation patterns of the MSA as colour codes on the structural model.

32

Another visual representation of the MSA in 1D

Also useful. I download this file and retain it.

Click on this link to gain access to a pdb file in which each residue is accompanied by a color code proportional to the conservation score.

pg 33

Download these!

And this too.

Give each an informative name.

pg

Phe121

Flip over horiz axis

Ile119

34

Highly conserved

Very variable

5- a Displaying the conservation patterns of the MSA as colour codes on the structural model.

Open the desired† pdb file in Chimera, then open the colouring script “chimera_consurf.cmd”.

There is a patch of exposed residues.

Open your saved pdb file (the one with the consurf

† warning: because we only used 5 sequences, Consurf regards that all residues are evaluated based on insufficient data. Open the file that does not show insufficient data.

pgNtermLys2

CtermPro550

Phe121

Arg120

Ile119

35

5a-b Exposed hydrophobes; burial of these within a binding interface can provide driving force for complex formation.5a-c Patches of conserved charge may attract and bind to conserved opposite charge on the partner protein.To see which conserved residues are hydrophobic, we can leave Consurf’s colours on the surface and colour atoms according to hydrophobicity. I made Ile, Leu, Val, Phe and Trp green. Alternately we can make Asp and Glu red and Arg and Lys blue.

Arg46

Met1

Pro550

pg 36

Met1

Pro550

Phe121

Arg120

Ile119

Arg46

Why did you mark the locations of the termini of the protein?

Because in this case they can provide a reality check on the quality of our docking model. Guess why

pg 37

0- Get the genes.1- Get the amino acid sequence of the protein. (Primary structure)2- Search for other proteins with high homology and known structure.3- Model the amino acid sequence of the query protein onto the fold of the homologous template protein, use simulated molecular dynamics to allow the new amino acid side chains to adjust to their folded environment and to allow loops and secondary structures to adjust to their new lengths. (Secondary and tertiary structures)4- Test to see if function is likely to be supported by the model structure, on the basis of fit with cofactors. (Tertiary structure and function)5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure).

Now, do it all again for the other domain of Tm1426. The small C-terminal domain which has homology to NuoE ferredoxin.

Multiple sequence alignment of C-term of HydA (Tm1426) with other NuoE-like domains that might also bind to HydA. Tm1426=NP_229226 also Tm1424=NP_229224YP_001956466 (from a termite-gut bacterium’s HydA),Clo1313-1885= YP_005688388Cthe_0338=YP_001036769Clo1313_1793=YP_005688300Cthe_0428=YP_001036859Component of a hydrogenase 2AUV, page 14Also: complex I of Thermus thermophilus

Identical, keep just Clo1313-1885

Identical, keep Clo1313_1793

part b: the C-terminal domain

pg

N � C

red = poor reliability

38

C-terminal domain of thioredoxin-like 2Fe2S ferredoxin of Desulfovibrio fructosovorans NADP_reducing hydrogenase: HndA

3b

pg

3b

39

NuoE: thioredoxin-like 2Fe2S ferredoxin of Complex I from T thermophilus

pg 40

NuoE of 3iam-B2auv-A (only includes C-terminal domain)

3b

pg 41

4b Active site integrity: does the Fe2S2 cluster fit?

modeled on 2auv modeled on 3iam

N- and C- termini of the models are marked with

blue and red arrows respectively (guess why

this could be useful).

pg

Multiple sequence alignment of C-term of HydA (Tm1426) with other NuoE-like domains that might also bind to HydA. Tm1426=NP_229226Tm1424=NP_229224YP_001956466 (from the termite-gut bacterium HydA)Clo1313-1885= YP_005688388Clo1313_1793=YP_005688300

clustalo-I20140123-023154-0071-61010473-oy

start of 2AUV model

5b

42

pg

N-term

43

Flip over horiz axis

Examples of models coloured to show conserved residues, hydrophobes, -ves and +ves.

I have to admit: none of this is obvious or compelling. Your job is to identify something intelligent to go by and just try.

pg 44

5b manual docking exploiting your chemical intelligence.

pg 45

Meanwhile: lets see if a computer can do any better. (Not

part of the assignment, but an interesting external

control.

pg

Review of goals and steps:Develop a model for the structure of an enzyme complex when know the nucleotide sequence of the gene.

1- obtain the amino acid sequence of the protein. (Primary structure)2- Search for other proteins with high homology and known structure.3- Model the amino acid sequence of the query protein onto the fold of the homologous template protein, (simulated molecular dynamics to adjust sequence of one protein to structure of the other (Secondary and tertiary structures).4- Test to see if function is likely to be supported by the model structure, on the basis of fit with cofactors. (Tertiary structure and function)5- Look for signatures of protein-protein interactions and use these to guide docking together of the different subunits or domains of the complex (Quaternary structure).5a: exposed conserved residues, 5b exposed hydrophobes, 5c exposed complementary charges.

Goals Tools (just a small sampling of the many)1- http://web.expasy.org/translate/(the parent site provides access to many more tools: http://www.expasy.org/)2- BLAST of various sorts:http://blast.ncbi.nlm.nih.gov/Blast.cgi(parent site provides access to many more tools: http://www.ncbi.nlm.nih.gov/ )3- SwissModel http://swissmodel.expasy.org/?pid=smd03(again, this is just one of a whole family of tools.)4- Download the template protein from the PDB and transfer the cofactor coordinates into your model.http://www.rcsb.org/pdb/home/home.do5a- Homology must be obtained within a curated set of proteins that retain the same activity as the model, in order to identify features of interest and interaction patterns, since modules are reused in many different contexts. I have done this for you.

46