Functional Annotation Wiki

Functional Annotation WikiTvLDH.profile

5/16/16

I used this website to form the ramachandran plotshttp://mordred.bioc.cam.ac.uk/~rapper/rampage.php

Ideally, it would be nice to use Fugue, but it was not cooperating.

Plan is to do Java or R code, but this will do for now.

I thought it would be interesting to compare R-plot with PDB vs the model with the lowest Z score

riley general 72 pdb

https://wiki.vcu.edu/download/attachments/62460064/TvLDH.profile?version=1&modificationDate=1463269261196&api=v2

http://mordred.bioc.cam.ac.uk/~rapper/rampage.php

nemo general 2lvh pdb

channel fever general

4hrz pdb

doof general

1h4w pdb

use these files for java or R

393

Doofinshmirtz_Draft_60.B99990004.pdb

1hw4_1.fasta

1hw4_2.fasta

401

4hrz_extract_A.pdb

ChannelFever_103.B99990003.pdb

ChannelFever.fasta

4hrz_extract_A.fasta

3201

Nemo_Draft_66.B99990046.pdb

72

riley_draft.B99990001.pdb

4/28/16

, or iscrete ptimized rotein nergy, is a used to assess in . DOPE is based on DOPE D O P E statistical potential homology models protein structure predictionan improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures.

https://en.wikipedia.org/wiki/Discrete_optimized_protein_energy

normalization:

https://wiki.vcu.edu/download/attachments/62460064/Doofinshmirtz_Draft_60.B99990004.pdb?version=1&modificationDate=1463255373670&api=v2

https://wiki.vcu.edu/download/attachments/62460064/1hw4_1.fasta?version=1&modificationDate=1463255396428&api=v2

https://wiki.vcu.edu/download/attachments/62460064/1hw4_2.fasta?version=1&modificationDate=1463255429053&api=v2

https://wiki.vcu.edu/download/attachments/62460064/4hrz_extract_A.pdb?version=1&modificationDate=1463255498306&api=v2

https://wiki.vcu.edu/download/attachments/62460064/ChannelFever_103.B99990003.pdb?version=2&modificationDate=1463256842549&api=v2

https://wiki.vcu.edu/download/attachments/62460064/ChannelFever.fasta?version=1&modificationDate=1463255557239&api=v2

https://wiki.vcu.edu/download/attachments/62460064/4hrz_extract_A.fasta?version=1&modificationDate=1463255582149&api=v2

https://wiki.vcu.edu/download/attachments/62460064/Nemo_Draft_66.B99990046.pdb?version=1&modificationDate=1463258533590&api=v2

https://wiki.vcu.edu/download/attachments/62460064/riley_draft.B99990001.pdb?version=1&modificationDate=1463261473298&api=v2

https://en.wikipedia.org/wiki/Statistical_potential

https://en.wikipedia.org/wiki/Homology_modeling

https://en.wikipedia.org/wiki/Protein_structure_prediction

https://en.wikipedia.org/wiki/Discrete_optimized_protein_energy

StandardizationStandardization or z-scores is the most commonly used method. It converts all indicators to a common scale with an average of zero and standard deviation of one.

The average of zero means that it avoids introducing aggregation distortions stemming from differences in indicators’ means. The scaling factor is the standard deviation of the indicator across, for instance, the countries, companies or blogs being ranked. Thus, an indicator with extreme values will have intrinsically a greater effect on the composite indicator.

http://howto.commetrics.com/methodology/statistics/normalization/

The output is a Z-score

"This command assesses the quality of the model using the normalized DOPE method. This is a Z-score; positive scores are likely to be poor models, while scores lower than -1 or so are likely to be native-like." https://salilab.org/modeller/9v6/manual/node189.html

only two of the models have a negative z/dope score. not good look for other models. ahh! not good. meh. will do more research.

it may be worth it to consider choosing proteins with the highest percent identity. meh.

72

riley_draft.B99990001.pdb

<< end of ENERGY.

DOPE score : -12049.214844>> Normalized DOPE z score: 1.142 <--YYYAAASSSSSSSSSS

3514


DOPE score : -50436.531250

>> Normalized DOPE z score: 1.161

664

SageFayge_89.B99990004.pdb

DOPE score : -28444.425781


287

Marcel_Draft_43.B99990081.pdb

DOPE score : -22397.707031>> Normalized DOPE z score: 0.745

3201



735



269

Nemo.B99990033.pdb


1152


http://howto.commetrics.com/methodology/statistics/normalization/

https://salilab.org/modeller/9v6/manual/node189.html

393

Doofinshmirtz_Draft_60.B99990004.pdb

DOPE score : -30562.261719

>> Normalized DOPE z score: -1.051 <-this model had a 96% matrix identity. nice dope score

491

Pegasus_Draft_gp110.B99990094.pdb

DOPE score : -44425.906250


401

ChannelFever_103.B99990003.pdb

DOPE score : -10373.522461

>> Normalized DOPE z score: -0.288 <-weakest of the dope good dope scores. still nice though

4/22/16

I have modelled enough proteins. Now need to normalize the dope scores then choose the best model and form the ramachan plot.

72 lipoprotein

401 tail sheath protein

287 portal protein

4/15/16

1152 RNA polymerase sigma-E factor

735 unknown

3201 zinc finger

269 Single-stranded DNA binding protein

4/8/16

735 unknown

491

T4 UvsW:

The uvsWXY system is implicated in the replication and repair of the bacteriophage T4 genome. Whereas the roles of the recombinase (UvsX) and the recombination mediator protein (UvsY) are known, the precise role of UvsW is unclear.

393 Thymidylate synthase https://en.wikipedia.org/wiki/Thymidylate_synthase

""Most organisms, including humans, use the - or -encoded classic thymidylate synthase whereas some bacteria use the similar thyA TYMS flavin- (FDTS) instead.dependent thymidylate synthase ""[1]

https://en.wikipedia.org/wiki/Thymidylate_synthase_(FAD)

model 393. transferase. made from modeler

4/4/16

Models for one brach are completed. Talked to phil. He said that after models are done, can assess DOPE score in order to find the "best" model out of 100 that were made.

I have to look more into this, because according to Phil there are different ways to evaluate protein models and DOPE is among many. Can also do Ramacplot as wellhandran

4/1/16

models done for 664 and 2718. Next will be 401, then move on to the hypotheticals.

https://en.wikipedia.org/wiki/Thymidylate_synthase

https://en.wikipedia.org/w/index.php?title=ThyA&action=edit&redlink=1

https://en.wikipedia.org/wiki/TYMS

https://en.wikipedia.org/w/index.php?title=Flavin-dependent_thymidylate_synthase&action=edit&redlink=1

https://en.wikipedia.org/w/index.php?title=Flavin-dependent_thymidylate_synthase&action=edit&redlink=1

https://en.wikipedia.org/wiki/Thymidylate_synthase_(FAD)#cite_note-1

https://en.wikipedia.org/wiki/Thymidylate_synthase_(FAD)

3/25/16

we be making models!

Met with Hardick, and modeller is working. Next thing will be doing more blast hits and HHpred hits to make sure i choose the most accurate PDB sequence.

3/23/16

Clustal highest percent identity: ***PDB sequence and phams from phamerators were aligned together using clustalx****

72 - Riley_Draft 30.58%

248 - B4_0230 21.58%

393 - Doofinshmirtz_Draft_60 96.33%

401 - ChannelFever_103 20%

491 - Pegasus_Draft_gp110 24.78%

664 - SageFage_89 21.70%

735 - MightyMouseDraft_202 20.73%

850 - JPB9_Draft_gp64 22.22%

1647 - Eyuki_Draft_62 22.04%

1732 - Salinjah_Draft_103 31.64%

1833 Eyunki_Draft_98 28%

2718 IceQueen_Draft_129 29.91%

3514 MoonBeam_34 20%

735 SageFayge_202 20.73%

3/16/16

Small set back with the alignment files. So half of the .ali file was done correctly, half was not done correctly.

Spoke with Dr. Moiser, first, every chain from each protein must first be extracted by importing the .pdb file into sybyl. then each extract saved into a seperate .pdb and .fasta file.

After that is done, the .pdb file from the database and your extracts are then compared to eachother and aligned in clustalx.

Then breaks are noted within the .pdb database file, and those breaks are mimicked in your alignment file formed by the phams and the database pdb file.

so, now will start with proteins with only one chain to make my life easier, after i am more familar with the process with move to proteins with more than one chain.

Proteins with one chain: 1833,401,491,393,2718,664,1732 (i know these are pham numbers but for my purposes using them to denote proteins because its easier for me to read)

Proteins with one chain and no breaks: 401,2718,664. <-----will start with these.

Phams and

PDB codes

no. of phages with phams

location/synteny

blastp

conserved domain?

blast p hit HHpred hit Alignment

Files, Clustal X 2.o used

PDB

file.pdb has coordinates (sybil input, save chains into fasta)

Fasta

file from .pdb of chain(s)

401

4HRZ

54 nemo 113 yes baseplate protein

tail lysozyme e value 2e-22

HHPRed hits, and image

401.aln 4HRZ.PDB

4HRZ.fasta

2718

3HR8

52 nemo 181 yes recombinase A recombinase 5.4-e54 2718.aln 3HR8.PDB 3HR8.fasta

664

2G8L

53 nemo 89 no major capsid protein

major capsid protein, 2.7e-05

664.aln 2G8L.PDB 2G8L.fasta

https://wiki.vcu.edu/download/attachments/62460064/401?version=1&modificationDate=1456594219984&api=v2

http://toolkit.tuebingen.mpg.de/hhpred/results/7012543

https://wiki.vcu.edu/download/attachments/62460064/gp113%20hhpred.png?version=1&modificationDate=1458317060574&api=v2

https://wiki.vcu.edu/download/attachments/62460064/gp113%20hhpred.png?version=1&modificationDate=1458317060574&api=v2

https://wiki.vcu.edu/download/attachments/62460064/401.aln?version=1&modificationDate=1457037930218&api=v2

https://wiki.vcu.edu/download/attachments/62460064/4hrz.pdb?version=2&modificationDate=1463256890429&api=v2

https://wiki.vcu.edu/download/attachments/62460064/4hrz.fasta.txt?version=1&modificationDate=1457379848192&api=v2



https://wiki.vcu.edu/download/attachments/62460064/3hr8.pdb?version=1&modificationDate=1457381396104&api=v2

https://wiki.vcu.edu/download/attachments/62460064/3hr8.fasta.txt?version=1&modificationDate=1457381425085&api=v2



https://wiki.vcu.edu/download/attachments/62460064/2g8l.pdb?version=1&modificationDate=1457380623700&api=v2

https://wiki.vcu.edu/download/attachments/62460064/2g8l.fasta.txt?version=1&modificationDate=1457380652034&api=v2

3/13/16

made alignment file with pham and pdb file using clustalx.

then copy and pasted Nemo clustalx alignment/pdb target sequence into .ali file

this is step two. so instead of the program making this file for me, i did it manually.

3/12/16

made alignment file in NBRF/PIR format, with .pir file and pdb file. (.ali file)

3/7/16

download PDB / fasta files from : http://www.rcsb.org/pdb/home/home.do

4 letter code is the structural genomics code. found on PDB website

3/3/16

multiple sequence alignment done with clustalx.

Worthy phams.

Pham # of phages with pham

location/synteny

blastp

conserved domain?

interesting blast matches

HHPred information Alignment

Files, Clustal X 2.o used

PDB

file.pdb has coordinates (sybil input, save chains into fasta)

Fasta

file from .pdb of chain(s)

1732,

4Q2W

51 nemo 104 yes tail lysin e value 0

Aureus Autolysin E in complex 1.1e-19, 28% identical to Nemo

1732.aln 4Q2W,

protein record

4Q2W

1833

1ML8

51 nemo 97 yes tail sheath protein

tail sheath protein 9.8e-63 1833.aln 1ML8.PDB 1ML8.fasta

1647

3VPB

53 nemo 61 yes adenylate kinase

adenylate kinase 1.7e-18 1647.aln 3VPB.PDB 3VPB.fasta

850

4ZC0

51 nemo 122 yes dna helicase replicative DNA helicase e-value 1.2e53

850.aln 4ZC0.PDB 4ZC0.fasta

491

20CA

52 nemo 118 yes dna helicase transcription elongation/dna repair protein 6.3e-54

491.aln 20CA.PDB 20CA.fasta

393

1HW4

54 nemo 64 yes thymidlate synthase

thymidylate synthase 8e-101 393.aln 1HW4.PDB 1HW4.fasta

248

3QG5

52 nemo 124 yes exonuclease exonuclease 2.4e-41 248.aln 3QG5.PDB 3QG5.fasta

72

2K1G

51 nemo 105 yes tail lysin lipoprotein 2.5e-30 72.aln 2K1G.PDB 2K1G.fasta

http://www.rcsb.org/pdb/home/home.do



https://wiki.vcu.edu/download/attachments/62460064/4q2w.pdb?version=1&modificationDate=1457379286314&api=v2

http://www.rcsb.org/pdb/explore.do?structureId=4Q2W

https://wiki.vcu.edu/download/attachments/62460064/4q2w.fasta.txt?version=1&modificationDate=1457379324759&api=v2

https://wiki.vcu.edu/download/attachments/62460064/1833%20%281%29?version=1&modificationDate=1456243025765&api=v2


https://wiki.vcu.edu/download/attachments/62460064/1ml8.pdb?version=1&modificationDate=1457381771768&api=v2

https://wiki.vcu.edu/download/attachments/62460064/1ml8.fasta.txt?version=1&modificationDate=1457381799100&api=v2



https://wiki.vcu.edu/download/attachments/62460064/3vpb.pdb?version=1&modificationDate=1457380102919&api=v2

https://wiki.vcu.edu/download/attachments/62460064/3vpb.fasta.txt?version=1&modificationDate=1457380136337&api=v2



https://wiki.vcu.edu/download/attachments/62460064/4zc0.pdb?version=1&modificationDate=1457380276024&api=v2

https://wiki.vcu.edu/download/attachments/62460064/4zc0.fasta.txt?version=1&modificationDate=1457380304646&api=v2



https://wiki.vcu.edu/download/attachments/62460064/2oca.pdb?version=1&modificationDate=1457380791479&api=v2

https://wiki.vcu.edu/download/attachments/62460064/2oca.fasta.txt?version=1&modificationDate=1457380824508&api=v2



https://wiki.vcu.edu/download/attachments/62460064/1hw4.pdb?version=1&modificationDate=1457380954190&api=v2

https://wiki.vcu.edu/download/attachments/62460064/1hw4.fasta.txt?version=1&modificationDate=1457380980506&api=v2



https://wiki.vcu.edu/download/attachments/62460064/3qg5.pdb?version=1&modificationDate=1457381097250&api=v2

https://wiki.vcu.edu/download/attachments/62460064/3qg5.fasta.txt?version=1&modificationDate=1457381126805&api=v2



https://wiki.vcu.edu/download/attachments/62460064/2k1g.pdb?version=1&modificationDate=1457381243310&api=v2

https://wiki.vcu.edu/download/attachments/62460064/2k1g.fasta.txt?version=1&modificationDate=1457381268341&api=v2

3514

2VX0

56 nemo 76 yes terminase large subunit

terminase 6.4e-19 3514.aln 2VX0.PDB 2VX0.fasta

hypothetical phams

phams no of phages

location blast p

conserved domain

blast p HHpred PDB FASTA

3118 43 gp 174 of Nemo no all hypothetical

n/a

1224 51 nemo 78 no hypothetical CI repressor, high e value 2.0

2846 43 gp 93 of Nemo no all hypothetical mitochondrial import protein/chaperone/ Prob 78.0 E value 1.6, not impressed

1471 42 end of tail proteins (gp 110 in Sage/111 in Nemo)

no nothing, all hypothetical protein

1957 45 beginning of tail proteins (gp 98 in Nemo)

no annotations include structural protein, tail protein, and tail tube subunit (enterococcus phage EFDG1, TmpA for Staphylococcus phage A3R)

nothing interesting

2853 52 nemo, 95 no Hypothetical hypothetical

2879 27 nemo 151 no hypothetical DNA binding protein, high e value 2.3

2846 nemo 93 no hypothetical no

1205 51 nemo 218 no hypothetical no stoppedhere

1152 51 nemo 184 no RNA polymerase sigma no 1OR7

1252 52 nemo 87 yes prohead no

269 51 nemo 180 no hypothetical no 2A1K

58 51 nemo 82 no hypothetical tail needle protein

3201

check 4x9j

44 gp 65 of Sage

no no 5 high quality zinc finger hits, probably DNA binding/transcription factor

Prob 99.4 E value 1.8E-14

no results found

2LVH

735

51 nemo 204 no hypothetical protein of unknown function e-value 3D

NX

287

N/A

50 nemo 86 yes hypothetical portal protein 7.5 e-42 3KDR

n/a

1164 48 nemo 197 no hypothetical no



https://wiki.vcu.edu/download/attachments/62460064/2vxo.pdb?version=1&modificationDate=1457381544682&api=v2

https://wiki.vcu.edu/download/attachments/62460064/2vxo.fasta.txt?version=1&modificationDate=1457381572933&api=v2















http://www.rcsb.org/pdb/explore.do?structureId=4X9J

http://www.rcsb.org/pdb/explore.do?structureId=2LVH




Documents

Functional Annotation Wiki