17
Functional Annotation Wiki TvLDH.profile 5/16/16 I used this website to form the ramachandran plots http://mordred.bioc.cam.ac.uk/~rapper/rampage.php Ideally, it would be nice to use Fugue, but it was not cooperating. Plan is to do Java or R code, but this will do for now. I thought it would be interesting to compare R-plot with PDB vs the model with the lowest Z score riley general 72 pdb

Functional Annotation Wiki

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Functional Annotation Wiki

Functional Annotation WikiTvLDH.profile

 

 

5/16/16

I used this website   to form the ramachandran plotshttp://mordred.bioc.cam.ac.uk/~rapper/rampage.php

Ideally, it would be nice to use Fugue, but it was not cooperating. 

Plan is to do Java or R code, but this will do for now. 

I thought it would be interesting to compare R-plot with PDB vs the model with the lowest Z score

 

riley general 72 pdb

Page 2: Functional Annotation Wiki

nemo general 2lvh pdb

channel fever general

 

4hrz pdb

Page 3: Functional Annotation Wiki

doof general                                                    

 

 1h4w pdb 

 

use these files for java or R

393 

Doofinshmirtz_Draft_60.B99990004.pdb

1hw4_1.fasta

1hw4_2.fasta

401

4hrz_extract_A.pdb

ChannelFever_103.B99990003.pdb

ChannelFever.fasta

4hrz_extract_A.fasta

3201

Nemo_Draft_66.B99990046.pdb

72

riley_draft.B99990001.pdb

 

 

4/28/16

, or  iscrete  ptimized  rotein  nergy, is a   used to assess   in  . DOPE is based on DOPE D O P E statistical potential homology models protein structure predictionan improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. 

https://en.wikipedia.org/wiki/Discrete_optimized_protein_energy

normalization: 

Page 4: Functional Annotation Wiki

StandardizationStandardization or z-scores is the most commonly used method. It converts all indicators to a common scale with an average of zero and standard deviation of one.

The average of zero means that it avoids introducing aggregation distortions stemming from differences in indicators’ means. The scaling factor is the standard deviation of the indicator across, for instance, the countries, companies or blogs being ranked. Thus, an indicator with extreme values will have intrinsically a greater effect on the composite indicator.

http://howto.commetrics.com/methodology/statistics/normalization/

The output is a Z-score

"This command assesses the quality of the model using the normalized DOPE method. This is a Z-score; positive scores are likely to be poor models, while scores lower than -1 or so are likely to be native-like." https://salilab.org/modeller/9v6/manual/node189.html

only two of the models have a negative z/dope score. not good look for other models. ahh! not good. meh. will do more research.

it may be worth it to consider choosing proteins with the highest percent identity. meh.

72

riley_draft.B99990001.pdb

 << end of ENERGY.

DOPE score : -12049.214844>> Normalized DOPE z score: 1.142 <--YYYAAASSSSSSSSSS 

3514

Nemo_Draft_76.B99990017.pdb

 DOPE score : -50436.531250

>> Normalized DOPE z score: 1.161

664

SageFayge_89.B99990004.pdb

 DOPE score : -28444.425781

>> Normalized DOPE z score: 0.027

287

Marcel_Draft_43.B99990081.pdb     

DOPE score : -22397.707031>> Normalized DOPE z score: 0.745

3201

Nemo_Draft_66.B99990046.pdb     

DOPE score : -2657.818604>> Normalized DOPE z score: 0.435

735

Nemo_Draft_204.B99990043.pdb      

DOPE score : -10737.092773>> Normalized DOPE z score: 1.183

269

Nemo.B99990033.pdb          

DOPE score : -15394.646484>> Normalized DOPE z score: 0.201

1152

DOPE score : -16702.130859>> Normalized DOPE z score: 0.171

Page 5: Functional Annotation Wiki

393

Doofinshmirtz_Draft_60.B99990004.pdb    

DOPE score               : -30562.261719

>> Normalized DOPE z score: -1.051 <-this model had a 96% matrix identity. nice dope score

491

Pegasus_Draft_gp110.B99990094.pdb     

DOPE score               : -44425.906250

>> Normalized DOPE z score: 0.520

401

ChannelFever_103.B99990003.pdb    

DOPE score               : -10373.522461

>> Normalized DOPE z score: -0.288 <-weakest of the dope good dope scores. still nice though

 

 

 

4/22/16

I have modelled enough proteins. Now need to normalize the dope scores then choose the best model and form the ramachan plot. 

 

72 lipoprotein

Page 6: Functional Annotation Wiki

  

 

401 tail sheath protein

Page 7: Functional Annotation Wiki

 

 

287 portal protein

Page 8: Functional Annotation Wiki

 

 

4/15/16

1152 RNA polymerase sigma-E factor

Page 9: Functional Annotation Wiki

   

735 unknown

Page 10: Functional Annotation Wiki

 

3201 zinc finger

 

Page 11: Functional Annotation Wiki

 

269  Single-stranded DNA binding protein 

 

4/8/16

735 unknown

Page 12: Functional Annotation Wiki

 

 

491 

T4 UvsW:

The uvsWXY system is implicated in the replication and repair of the bacteriophage T4 genome. Whereas the roles of the recombinase (UvsX) and the recombination mediator protein (UvsY) are known, the precise role of UvsW is unclear.

Page 13: Functional Annotation Wiki

 

393 Thymidylate synthase https://en.wikipedia.org/wiki/Thymidylate_synthase

""Most organisms, including humans, use the  - or  -encoded classic thymidylate synthase whereas some bacteria use the similar thyA TYMS flavin- (FDTS) instead.dependent thymidylate synthase  ""[1]

https://en.wikipedia.org/wiki/Thymidylate_synthase_(FAD)

  

model 393. transferase. made from modeler

4/4/16

Models for one brach are completed. Talked to phil. He said that after models are done, can assess DOPE score in order to find the "best" model out of 100 that were made.

I have to look more into this, because according to Phil there are different ways to evaluate protein models and DOPE is among many. Can also do Ramacplot as wellhandran 

 

4/1/16

models done for 664 and 2718. Next will be 401, then move on to the hypotheticals. 

 

Page 14: Functional Annotation Wiki

3/25/16

we be making models!

Met with Hardick, and modeller is working. Next thing will be doing more blast hits and HHpred hits to make sure i choose the most accurate PDB sequence. 

 

3/23/16

Clustal highest percent identity: ***PDB sequence and phams from phamerators were aligned together using clustalx****

72 - Riley_Draft    30.58%

248 - B4_0230    21.58%

393 - Doofinshmirtz_Draft_60     96.33%

401 - ChannelFever_103     20%

491 - Pegasus_Draft_gp110     24.78%

664 - SageFage_89     21.70%

735 - MightyMouseDraft_202     20.73%

850 - JPB9_Draft_gp64     22.22%

1647 - Eyuki_Draft_62     22.04%

1732 - Salinjah_Draft_103     31.64%

1833 Eyunki_Draft_98     28%

2718 IceQueen_Draft_129     29.91%

3514 MoonBeam_34     20%

735 SageFayge_202    20.73%

 

3/16/16

Small set back with the alignment files. So half of the .ali file was done correctly, half was not done correctly.

Spoke with Dr. Moiser, first, every chain from each protein must first be extracted by importing the .pdb file into sybyl. then each extract saved into a seperate .pdb and .fasta file.

After that is done, the .pdb file from the database and your extracts are then compared to eachother and aligned in clustalx.

Then breaks are noted within the .pdb database file, and those breaks are mimicked in your alignment file formed by the phams and the database pdb file.

so, now will start with proteins with only one chain to make my life easier, after i am more familar with the process with move to proteins with more than one chain.

Proteins with one chain: 1833,401,491,393,2718,664,1732 (i know these are pham numbers but for my purposes using them to denote proteins because its easier for me to read)

Proteins with one chain and no breaks: 401,2718,664. <-----will start with these.

Phams and

PDB codes

no. of phages with phams

location/synteny

blastp

conserved domain?

blast p hit HHpred hit Alignment

Files, Clustal X 2.o used

PDB

file.pdb has coordinates (sybil input, save chains into fasta)

Fasta

file from .pdb of chain(s)

401

4HRZ

 

54 nemo 113 yes baseplate protein

tail lysozyme e value 2e-22

 

HHPRed hits, and image

401.aln 4HRZ.PDB

 

 

4HRZ.fasta

2718

3HR8

52 nemo 181 yes recombinase A recombinase 5.4-e54 2718.aln 3HR8.PDB 3HR8.fasta

664

2G8L

53 nemo 89 no major capsid protein

major capsid protein, 2.7e-05

664.aln 2G8L.PDB 2G8L.fasta

Page 15: Functional Annotation Wiki

                 

 

 

3/13/16

made alignment file with pham and pdb file using clustalx.

then copy and pasted Nemo clustalx alignment/pdb target sequence into .ali file

this is step two. so instead of the program making this file for me, i did it manually.

3/12/16

made alignment file in NBRF/PIR format, with .pir file and pdb file. (.ali file)

3/7/16

download PDB / fasta files from : http://www.rcsb.org/pdb/home/home.do

4 letter code is the structural genomics code. found on PDB website

 

3/3/16

multiple sequence alignment done with clustalx. 

 

Worthy phams.

Pham # of phages with pham

location/synteny

blastp

conserved domain?

interesting blast matches

HHPred information Alignment

Files, Clustal X 2.o used

PDB

file.pdb has coordinates (sybil input, save chains into fasta)

Fasta

file from .pdb of chain(s)

1732,

4Q2W

 

51 nemo 104 yes tail lysin e value 0

Aureus Autolysin E in complex 1.1e-19, 28% identical to Nemo

1732.aln 4Q2W,

protein record

4Q2W

1833

1ML8

 

51 nemo 97 yes tail sheath protein

tail sheath protein 9.8e-63 1833.aln 1ML8.PDB 1ML8.fasta

1647

3VPB

 

53 nemo 61 yes adenylate kinase

adenylate kinase 1.7e-18 1647.aln 3VPB.PDB 3VPB.fasta

850

4ZC0

51 nemo 122 yes dna helicase replicative DNA helicase e-value 1.2e53

850.aln 4ZC0.PDB 4ZC0.fasta

491

20CA

52 nemo 118 yes dna helicase transcription elongation/dna repair protein 6.3e-54

491.aln 20CA.PDB 20CA.fasta

393

1HW4

54 nemo 64 yes thymidlate synthase

thymidylate synthase 8e-101 393.aln 1HW4.PDB 1HW4.fasta

248

3QG5

 

52 nemo 124 yes exonuclease exonuclease 2.4e-41 248.aln 3QG5.PDB 3QG5.fasta

72

2K1G

51 nemo 105 yes tail lysin lipoprotein 2.5e-30 72.aln 2K1G.PDB 2K1G.fasta

Page 16: Functional Annotation Wiki

3514

2VX0

 

56 nemo 76 yes terminase large subunit

terminase 6.4e-19 3514.aln 2VX0.PDB 2VX0.fasta

hypothetical phams

phams no of phages

location blast p

conserved domain

blast p HHpred PDB FASTA

3118 43 gp 174 of Nemo no all hypothetical  

n/a

   

1224 51 nemo 78 no hypothetical CI repressor, high e value 2.0    

2846 43 gp 93 of Nemo no all hypothetical mitochondrial import protein/chaperone/ Prob 78.0 E value 1.6, not impressed

   

1471 42 end of tail proteins (gp 110 in Sage/111 in Nemo)

no nothing, all hypothetical protein      

1957 45 beginning of tail proteins (gp 98 in Nemo)

no annotations include structural protein, tail protein, and tail tube subunit (enterococcus phage EFDG1, TmpA for Staphylococcus phage A3R)

nothing interesting    

2853 52 nemo, 95 no Hypothetical hypothetical    

2879 27 nemo 151 no hypothetical DNA binding protein, high e value 2.3

   

2846   nemo 93 no hypothetical no    

1205 51 nemo 218 no hypothetical no stoppedhere

 

1152 51 nemo 184 no RNA polymerase sigma no 1OR7 

 

1252 52 nemo 87 yes prohead no    

269 51 nemo 180 no hypothetical no 2A1K

 

58 51 nemo 82 no hypothetical tail needle protein    

3201

check 4x9j

44 gp 65 of Sage

 

no no 5 high quality zinc finger hits, probably DNA binding/transcription factor

Prob 99.4 E value 1.8E-14

no results found

2LVH

 

735

 

51 nemo 204 no hypothetical protein of unknown function e-value 3D

NX

 

287

N/A

50 nemo 86 yes hypothetical portal protein 7.5 e-42 3KDR

n/a

1164 48 nemo 197 no hypothetical no    

 

 

 

 

Page 17: Functional Annotation Wiki