Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Functional Annotation WikiTvLDH.profile
5/16/16
I used this website to form the ramachandran plotshttp://mordred.bioc.cam.ac.uk/~rapper/rampage.php
Ideally, it would be nice to use Fugue, but it was not cooperating.
Plan is to do Java or R code, but this will do for now.
I thought it would be interesting to compare R-plot with PDB vs the model with the lowest Z score
riley general 72 pdb
nemo general 2lvh pdb
channel fever general
4hrz pdb
doof general
1h4w pdb
use these files for java or R
393
Doofinshmirtz_Draft_60.B99990004.pdb
1hw4_1.fasta
1hw4_2.fasta
401
4hrz_extract_A.pdb
ChannelFever_103.B99990003.pdb
ChannelFever.fasta
4hrz_extract_A.fasta
3201
Nemo_Draft_66.B99990046.pdb
72
riley_draft.B99990001.pdb
4/28/16
, or iscrete ptimized rotein nergy, is a used to assess in . DOPE is based on DOPE D O P E statistical potential homology models protein structure predictionan improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures.
https://en.wikipedia.org/wiki/Discrete_optimized_protein_energy
normalization:
StandardizationStandardization or z-scores is the most commonly used method. It converts all indicators to a common scale with an average of zero and standard deviation of one.
The average of zero means that it avoids introducing aggregation distortions stemming from differences in indicators’ means. The scaling factor is the standard deviation of the indicator across, for instance, the countries, companies or blogs being ranked. Thus, an indicator with extreme values will have intrinsically a greater effect on the composite indicator.
http://howto.commetrics.com/methodology/statistics/normalization/
The output is a Z-score
"This command assesses the quality of the model using the normalized DOPE method. This is a Z-score; positive scores are likely to be poor models, while scores lower than -1 or so are likely to be native-like." https://salilab.org/modeller/9v6/manual/node189.html
only two of the models have a negative z/dope score. not good look for other models. ahh! not good. meh. will do more research.
it may be worth it to consider choosing proteins with the highest percent identity. meh.
72
riley_draft.B99990001.pdb
<< end of ENERGY.
DOPE score : -12049.214844>> Normalized DOPE z score: 1.142 <--YYYAAASSSSSSSSSS
3514
Nemo_Draft_76.B99990017.pdb
DOPE score : -50436.531250
>> Normalized DOPE z score: 1.161
664
SageFayge_89.B99990004.pdb
DOPE score : -28444.425781
>> Normalized DOPE z score: 0.027
287
Marcel_Draft_43.B99990081.pdb
DOPE score : -22397.707031>> Normalized DOPE z score: 0.745
3201
Nemo_Draft_66.B99990046.pdb
DOPE score : -2657.818604>> Normalized DOPE z score: 0.435
735
Nemo_Draft_204.B99990043.pdb
DOPE score : -10737.092773>> Normalized DOPE z score: 1.183
269
Nemo.B99990033.pdb
DOPE score : -15394.646484>> Normalized DOPE z score: 0.201
1152
DOPE score : -16702.130859>> Normalized DOPE z score: 0.171
393
Doofinshmirtz_Draft_60.B99990004.pdb
DOPE score : -30562.261719
>> Normalized DOPE z score: -1.051 <-this model had a 96% matrix identity. nice dope score
491
Pegasus_Draft_gp110.B99990094.pdb
DOPE score : -44425.906250
>> Normalized DOPE z score: 0.520
401
ChannelFever_103.B99990003.pdb
DOPE score : -10373.522461
>> Normalized DOPE z score: -0.288 <-weakest of the dope good dope scores. still nice though
4/22/16
I have modelled enough proteins. Now need to normalize the dope scores then choose the best model and form the ramachan plot.
72 lipoprotein
401 tail sheath protein
287 portal protein
4/15/16
1152 RNA polymerase sigma-E factor
735 unknown
3201 zinc finger
269 Single-stranded DNA binding protein
4/8/16
735 unknown
491
T4 UvsW:
The uvsWXY system is implicated in the replication and repair of the bacteriophage T4 genome. Whereas the roles of the recombinase (UvsX) and the recombination mediator protein (UvsY) are known, the precise role of UvsW is unclear.
393 Thymidylate synthase https://en.wikipedia.org/wiki/Thymidylate_synthase
""Most organisms, including humans, use the - or -encoded classic thymidylate synthase whereas some bacteria use the similar thyA TYMS flavin- (FDTS) instead.dependent thymidylate synthase ""[1]
https://en.wikipedia.org/wiki/Thymidylate_synthase_(FAD)
model 393. transferase. made from modeler
4/4/16
Models for one brach are completed. Talked to phil. He said that after models are done, can assess DOPE score in order to find the "best" model out of 100 that were made.
I have to look more into this, because according to Phil there are different ways to evaluate protein models and DOPE is among many. Can also do Ramacplot as wellhandran
4/1/16
models done for 664 and 2718. Next will be 401, then move on to the hypotheticals.
3/25/16
we be making models!
Met with Hardick, and modeller is working. Next thing will be doing more blast hits and HHpred hits to make sure i choose the most accurate PDB sequence.
3/23/16
Clustal highest percent identity: ***PDB sequence and phams from phamerators were aligned together using clustalx****
72 - Riley_Draft 30.58%
248 - B4_0230 21.58%
393 - Doofinshmirtz_Draft_60 96.33%
401 - ChannelFever_103 20%
491 - Pegasus_Draft_gp110 24.78%
664 - SageFage_89 21.70%
735 - MightyMouseDraft_202 20.73%
850 - JPB9_Draft_gp64 22.22%
1647 - Eyuki_Draft_62 22.04%
1732 - Salinjah_Draft_103 31.64%
1833 Eyunki_Draft_98 28%
2718 IceQueen_Draft_129 29.91%
3514 MoonBeam_34 20%
735 SageFayge_202 20.73%
3/16/16
Small set back with the alignment files. So half of the .ali file was done correctly, half was not done correctly.
Spoke with Dr. Moiser, first, every chain from each protein must first be extracted by importing the .pdb file into sybyl. then each extract saved into a seperate .pdb and .fasta file.
After that is done, the .pdb file from the database and your extracts are then compared to eachother and aligned in clustalx.
Then breaks are noted within the .pdb database file, and those breaks are mimicked in your alignment file formed by the phams and the database pdb file.
so, now will start with proteins with only one chain to make my life easier, after i am more familar with the process with move to proteins with more than one chain.
Proteins with one chain: 1833,401,491,393,2718,664,1732 (i know these are pham numbers but for my purposes using them to denote proteins because its easier for me to read)
Proteins with one chain and no breaks: 401,2718,664. <-----will start with these.
Phams and
PDB codes
no. of phages with phams
location/synteny
blastp
conserved domain?
blast p hit HHpred hit Alignment
Files, Clustal X 2.o used
PDB
file.pdb has coordinates (sybil input, save chains into fasta)
Fasta
file from .pdb of chain(s)
401
4HRZ
54 nemo 113 yes baseplate protein
tail lysozyme e value 2e-22
HHPRed hits, and image
401.aln 4HRZ.PDB
4HRZ.fasta
2718
3HR8
52 nemo 181 yes recombinase A recombinase 5.4-e54 2718.aln 3HR8.PDB 3HR8.fasta
664
2G8L
53 nemo 89 no major capsid protein
major capsid protein, 2.7e-05
664.aln 2G8L.PDB 2G8L.fasta
3/13/16
made alignment file with pham and pdb file using clustalx.
then copy and pasted Nemo clustalx alignment/pdb target sequence into .ali file
this is step two. so instead of the program making this file for me, i did it manually.
3/12/16
made alignment file in NBRF/PIR format, with .pir file and pdb file. (.ali file)
3/7/16
download PDB / fasta files from : http://www.rcsb.org/pdb/home/home.do
4 letter code is the structural genomics code. found on PDB website
3/3/16
multiple sequence alignment done with clustalx.
Worthy phams.
Pham # of phages with pham
location/synteny
blastp
conserved domain?
interesting blast matches
HHPred information Alignment
Files, Clustal X 2.o used
PDB
file.pdb has coordinates (sybil input, save chains into fasta)
Fasta
file from .pdb of chain(s)
1732,
4Q2W
51 nemo 104 yes tail lysin e value 0
Aureus Autolysin E in complex 1.1e-19, 28% identical to Nemo
1732.aln 4Q2W,
protein record
4Q2W
1833
1ML8
51 nemo 97 yes tail sheath protein
tail sheath protein 9.8e-63 1833.aln 1ML8.PDB 1ML8.fasta
1647
3VPB
53 nemo 61 yes adenylate kinase
adenylate kinase 1.7e-18 1647.aln 3VPB.PDB 3VPB.fasta
850
4ZC0
51 nemo 122 yes dna helicase replicative DNA helicase e-value 1.2e53
850.aln 4ZC0.PDB 4ZC0.fasta
491
20CA
52 nemo 118 yes dna helicase transcription elongation/dna repair protein 6.3e-54
491.aln 20CA.PDB 20CA.fasta
393
1HW4
54 nemo 64 yes thymidlate synthase
thymidylate synthase 8e-101 393.aln 1HW4.PDB 1HW4.fasta
248
3QG5
52 nemo 124 yes exonuclease exonuclease 2.4e-41 248.aln 3QG5.PDB 3QG5.fasta
72
2K1G
51 nemo 105 yes tail lysin lipoprotein 2.5e-30 72.aln 2K1G.PDB 2K1G.fasta
3514
2VX0
56 nemo 76 yes terminase large subunit
terminase 6.4e-19 3514.aln 2VX0.PDB 2VX0.fasta
hypothetical phams
phams no of phages
location blast p
conserved domain
blast p HHpred PDB FASTA
3118 43 gp 174 of Nemo no all hypothetical
n/a
1224 51 nemo 78 no hypothetical CI repressor, high e value 2.0
2846 43 gp 93 of Nemo no all hypothetical mitochondrial import protein/chaperone/ Prob 78.0 E value 1.6, not impressed
1471 42 end of tail proteins (gp 110 in Sage/111 in Nemo)
no nothing, all hypothetical protein
1957 45 beginning of tail proteins (gp 98 in Nemo)
no annotations include structural protein, tail protein, and tail tube subunit (enterococcus phage EFDG1, TmpA for Staphylococcus phage A3R)
nothing interesting
2853 52 nemo, 95 no Hypothetical hypothetical
2879 27 nemo 151 no hypothetical DNA binding protein, high e value 2.3
2846 nemo 93 no hypothetical no
1205 51 nemo 218 no hypothetical no stoppedhere
1152 51 nemo 184 no RNA polymerase sigma no 1OR7
1252 52 nemo 87 yes prohead no
269 51 nemo 180 no hypothetical no 2A1K
58 51 nemo 82 no hypothetical tail needle protein
3201
check 4x9j
44 gp 65 of Sage
no no 5 high quality zinc finger hits, probably DNA binding/transcription factor
Prob 99.4 E value 1.8E-14
no results found
2LVH
735
51 nemo 204 no hypothetical protein of unknown function e-value 3D
NX
287
N/A
50 nemo 86 yes hypothetical portal protein 7.5 e-42 3KDR
n/a
1164 48 nemo 197 no hypothetical no