Upload
brijesh-singh-yadav
View
411
Download
4
Embed Size (px)
DESCRIPTION
This report help in knowledge about Hepatis B Virus.
Citation preview
1
Introduction
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins Bioinformatics is limited to sequence
structural and functional analysis of genes and genomes and their corresponding
products and is often considered computational molecular biology It consists of
two subfields the development of computational tools and databases and the application
of these tools and databases in generating biological knowledge to better understand
living systems These tools are used in three areas of genomic and molecular biological
research molecular sequence analysis molecular structural analysis and molecular
functional analysis The areas of sequence analysis include sequence alignment sequence
database searching motif and pattern discovery gene and promoter finding
reconstruction of evolutionary relationships and genome assembly and comparison
Structural analyses include protein and nucleic acid structure analysis comparison
Classification and prediction The functional analysis includes gene expression profiling
protein- protein interaction prediction protein sub cellular localization prediction
metabolic pathway reconstruction and simulation The three aspects of bioinformatics
analysis are not isolated but often interact to produce integrated results For example
protein structure prediction depends on sequence alignment data clustering of gene
expression profiles requires the use of phylogenetic tree construction methods derived
In sequence analysis Sequence- based prediction is related functional analysis of co
expressed genes The first major bioinformatics project was undertaken by Margaret
Dayhoff in 1965 who developed a first protein sequence database called Atlas of Protein
Sequence and Structure Subsequently in the early 1970s the Brookhaven national
laboratory established the Protein Data Bank for archiving three-dimensional protein
structures At its onset the database stored less than a dozen protein structures compared
to more than 30000 structures today The first sequence alignment algorithm was
2
Developed by Needleman and Wunsch in 1970 This was a fundamental step in the
development of the field of bioinformatics which paved the way for the routine sequence
comparisons and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is aimed at
understanding structure-function and structure property relationship in physico-chemical
processes and pharmaceuticals amp thus has become increasingly important for finding and
designing new drugs In fact computers are playing an important role in new drug
discovery and drug design
HEPATITIS-
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and
suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its
own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of viruses known as the
hepatitis viruses cause most cases of liver damage worldwide
Hepatitis can also be due to toxins (notably alcohol) other infections or
from autoimmune process
It may run a sub clinical course when
the affected person may not feel ill The patient becomes unwell and
symptomatic when the disease impairs liver functions that include
3
among other things screening of harmful substances regulation of
blood composition and production of bile to help digestion
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever
virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q fever Rocky
mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C
(Hepatitis A and E do not lead to chronic disease)
4
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex genetic
information called DNA or RNA This genetic material allows the virus to infect bacteria
or living cells set up the machinery to reproduce itself leading to destruction of the cell
in which it resides To date five viruses labeled A through E have been identified which
appear to cause viral hepatitis Viruses A and E can be contracted from contaminated
water or food (by mouth) while viruses B C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin) The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases but over many years in others Thirty years ago none of the hepatitis viruses
had been identified In the 1960s transfusion-related viral hepatitis was extremely
common with 30 of patients receiving blood products becoming infected By 1970 a
blood test called the Australia antigen was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B The
investigator who discovered the Australia antigen the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded
the Nobel prize Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen
5
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Introduction
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins Bioinformatics is limited to sequence
structural and functional analysis of genes and genomes and their corresponding
products and is often considered computational molecular biology It consists of
two subfields the development of computational tools and databases and the application
of these tools and databases in generating biological knowledge to better understand
living systems These tools are used in three areas of genomic and molecular biological
research molecular sequence analysis molecular structural analysis and molecular
functional analysis The areas of sequence analysis include sequence alignment sequence
database searching motif and pattern discovery gene and promoter finding
reconstruction of evolutionary relationships and genome assembly and comparison
Structural analyses include protein and nucleic acid structure analysis comparison
Classification and prediction The functional analysis includes gene expression profiling
protein- protein interaction prediction protein sub cellular localization prediction
metabolic pathway reconstruction and simulation The three aspects of bioinformatics
analysis are not isolated but often interact to produce integrated results For example
protein structure prediction depends on sequence alignment data clustering of gene
expression profiles requires the use of phylogenetic tree construction methods derived
In sequence analysis Sequence- based prediction is related functional analysis of co
expressed genes The first major bioinformatics project was undertaken by Margaret
Dayhoff in 1965 who developed a first protein sequence database called Atlas of Protein
Sequence and Structure Subsequently in the early 1970s the Brookhaven national
laboratory established the Protein Data Bank for archiving three-dimensional protein
structures At its onset the database stored less than a dozen protein structures compared
to more than 30000 structures today The first sequence alignment algorithm was
2
Developed by Needleman and Wunsch in 1970 This was a fundamental step in the
development of the field of bioinformatics which paved the way for the routine sequence
comparisons and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is aimed at
understanding structure-function and structure property relationship in physico-chemical
processes and pharmaceuticals amp thus has become increasingly important for finding and
designing new drugs In fact computers are playing an important role in new drug
discovery and drug design
HEPATITIS-
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and
suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its
own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of viruses known as the
hepatitis viruses cause most cases of liver damage worldwide
Hepatitis can also be due to toxins (notably alcohol) other infections or
from autoimmune process
It may run a sub clinical course when
the affected person may not feel ill The patient becomes unwell and
symptomatic when the disease impairs liver functions that include
3
among other things screening of harmful substances regulation of
blood composition and production of bile to help digestion
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever
virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q fever Rocky
mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C
(Hepatitis A and E do not lead to chronic disease)
4
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex genetic
information called DNA or RNA This genetic material allows the virus to infect bacteria
or living cells set up the machinery to reproduce itself leading to destruction of the cell
in which it resides To date five viruses labeled A through E have been identified which
appear to cause viral hepatitis Viruses A and E can be contracted from contaminated
water or food (by mouth) while viruses B C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin) The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases but over many years in others Thirty years ago none of the hepatitis viruses
had been identified In the 1960s transfusion-related viral hepatitis was extremely
common with 30 of patients receiving blood products becoming infected By 1970 a
blood test called the Australia antigen was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B The
investigator who discovered the Australia antigen the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded
the Nobel prize Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen
5
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Developed by Needleman and Wunsch in 1970 This was a fundamental step in the
development of the field of bioinformatics which paved the way for the routine sequence
comparisons and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is aimed at
understanding structure-function and structure property relationship in physico-chemical
processes and pharmaceuticals amp thus has become increasingly important for finding and
designing new drugs In fact computers are playing an important role in new drug
discovery and drug design
HEPATITIS-
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and
suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its
own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of viruses known as the
hepatitis viruses cause most cases of liver damage worldwide
Hepatitis can also be due to toxins (notably alcohol) other infections or
from autoimmune process
It may run a sub clinical course when
the affected person may not feel ill The patient becomes unwell and
symptomatic when the disease impairs liver functions that include
3
among other things screening of harmful substances regulation of
blood composition and production of bile to help digestion
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever
virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q fever Rocky
mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C
(Hepatitis A and E do not lead to chronic disease)
4
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex genetic
information called DNA or RNA This genetic material allows the virus to infect bacteria
or living cells set up the machinery to reproduce itself leading to destruction of the cell
in which it resides To date five viruses labeled A through E have been identified which
appear to cause viral hepatitis Viruses A and E can be contracted from contaminated
water or food (by mouth) while viruses B C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin) The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases but over many years in others Thirty years ago none of the hepatitis viruses
had been identified In the 1960s transfusion-related viral hepatitis was extremely
common with 30 of patients receiving blood products becoming infected By 1970 a
blood test called the Australia antigen was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B The
investigator who discovered the Australia antigen the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded
the Nobel prize Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen
5
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
among other things screening of harmful substances regulation of
blood composition and production of bile to help digestion
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever
virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q fever Rocky
mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C
(Hepatitis A and E do not lead to chronic disease)
4
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex genetic
information called DNA or RNA This genetic material allows the virus to infect bacteria
or living cells set up the machinery to reproduce itself leading to destruction of the cell
in which it resides To date five viruses labeled A through E have been identified which
appear to cause viral hepatitis Viruses A and E can be contracted from contaminated
water or food (by mouth) while viruses B C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin) The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases but over many years in others Thirty years ago none of the hepatitis viruses
had been identified In the 1960s transfusion-related viral hepatitis was extremely
common with 30 of patients receiving blood products becoming infected By 1970 a
blood test called the Australia antigen was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B The
investigator who discovered the Australia antigen the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded
the Nobel prize Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen
5
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex genetic
information called DNA or RNA This genetic material allows the virus to infect bacteria
or living cells set up the machinery to reproduce itself leading to destruction of the cell
in which it resides To date five viruses labeled A through E have been identified which
appear to cause viral hepatitis Viruses A and E can be contracted from contaminated
water or food (by mouth) while viruses B C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin) The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases but over many years in others Thirty years ago none of the hepatitis viruses
had been identified In the 1960s transfusion-related viral hepatitis was extremely
common with 30 of patients receiving blood products becoming infected By 1970 a
blood test called the Australia antigen was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B The
investigator who discovered the Australia antigen the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded
the Nobel prize Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen
5
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Currently 11 viruses are recognized as causing hepatitis Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild self-resolving forms of hepatitis with no permanent
hepatic damage Both viruses causes the typical infectious mononucleosis of
fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well characterized
hepatitis G and TTV(transfusion transmitted virus) are newly discovered
viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called
enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The
most
important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally
called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common especially in
children
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the disease
Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the
alimentary tract and spreads to infect the liver where it multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset
of symptoms
6
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
World-wide distribution endemic in most countries The incidence in first world
countries is declining There is an especially high incidence in developing countries and
rural areas In rural areas of South Africa the seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women
Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut
initially before invading the liver and virus is shed in the stool prior to the onset of
symptoms Viraemia is transient A large inoculum of virus is needed to establish
infectionLittle is known yet The incidence of infection appears to be low in first world
countries
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
7
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Defective virus which requires Hepatitis B as a helper virus in order to replicate
Infection therefore only occurs in patients who are already infected with Hepatitis
BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in
diameter encapsulated with HBsAg derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B non-C
hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally
transmitted hepatitis but is no longer believed to be a major agent of liver disease It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting
human liver cells and other cells in the body once it gains access to the blood stream
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver the damage being caused by the individuals own immune system
attacking the virus-infected cells Since liver damage from the virus may be very little
many patients are called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver function tests
While many individuals remain healthy for many years or a lifetime others develop
chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked
to the virus and its effects although it is unlikely that the virus directly causes cancer
Those patients who develop hepatitis (damage to liver cells with inflammation) do so on
account of the bodys normal inclination to attack the foreign proteins contained in
viruses and in the cells in which the viruses are found This process called the immune
response determines the pace and the severity of the liver cell injury in this condition
and will be described in more detail below
8
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Since the identification of the hepatitis B virus several other viruses which are nearly
identical have been identified in Eastern woodchucks ground squirrels and Peking
ducks The members of this virus family termed the Hepadna viruses have similar life
cycles to that observed in man and can serve as animal models allowing further study of
these unique disease-causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg
Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome
9
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and virus
particles as well as excess viral surface protein are shed in large amounts into the blood
Viraemia is prolonged and the blood of infected individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to eliminate the
virus completely and become persistantly infected
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
10
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC)
HBV is thought to play a role in the development of this malignancy because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50 million of which
are in Africa Carriage rates vary markedly in different areas In South Africa infection is
much more common in rural communities than in the cities Hepatitis B is parenterally
transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
11
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority of individuals
become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental homes
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution
Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
12
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
well as those who clear the infection Its presence indicates exposure to HBVof the
chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three doses induces
protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants receive 3 doses
at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
13
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
following single episode exposure to HBV-infected blood For example needlestick
injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are not aware of
the infection for several weeks until they develop symptoms of acute hepatitis such as
nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves
completely in 95 of those infected
Others who do not develop significant symptoms following exposure
may not be aware of the infection These individuals may also overcome the infection
completely and develop immunity but frequently become chronic carriers
The outcome of hepatitis B infection depends to a great extent on
the status of the persons immune system at the time of exposure Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection although some
have persistent fatigue
Molecular virology
Genome circular and 32kb in size double strandedIt has compact
14
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction and no
noncoding regions The minus strand is unit length and has a protein covalently attached
15
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
to the 5 end The other strand the plus strand is variable in length but has less than unit
length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins The transcription and translation of these proteins
is through the used of multiple in-frame start codons The HBV genome also contains
parts that regulate transcription determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of
supporting its replication Although hepatocytes are known to be the most effective cell type for
replicating HBV other types of cells in the human body have be found to be able to support
replication to a lesser degree
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the
nucleus where it is known to form a convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for transcription
by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes
attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P which then
binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually
occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged
and reverse transcription begins
16
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
At early times after the infection the DNA is recirculated to the nucleus
where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al 765 )
17
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious particle
found within the body of an infected patient This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome
In infected people virions actually compose a small minority of HBV-derived particles
18
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Large numbers of smaller subviral particles are also presentthat usually outnumber the
virions in the ratio of 1001These two subviral particles the hepatitis B filament and a
hepatitis B sphereare often referred to as a group named surface antigen particlesThe
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein
The absence of the hepatitis B core polymerase and genome causes these particles to
have a non-infectious nature High levels of these non-infectious particles can be found
during the acute phase of the infection Since the non-infectious particles present the
same sites as the virion they induce a significant immune response and are thought to be
non-advantagous for the virus However it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis
B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen) It is very hydrophobic containing four-
transmembrane spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities It also
contains a highly antigenic epitope which may be responsible for triggering immune
response Regardless of the high Antigenicity and prevalence of these particlesthe
immune system appears basically oblivious to their presence
19
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test this antigen can only be isolated by analyzing an infected
hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they
are highly associated with nucleocapsid assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance
during an acute HBV infection Thought to be located in the core structure of the virus
molecule this antigen can be detected by blood test If found its usually indicative of
complete virus particles in circulation (Strauss 2002)
20
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
21
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of
Complex structure HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle and tubular or filamentous that vary in length These are infective
form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles the
cellular immune response to the envelope nucleocapsid and polymerase antigens
eliminates infected cells
The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
the development of DNA damage that can cause hepatocellular carcinoma Elucidation of
the immunological and virological basis for
22
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate
chronic HBV infection and reduce the risk of its life-threatening sequellae[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver
Transient infections run a course of several months and chronic infections are often
lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma The replication strategy of these viruses has been described in great detail but
virus-host interactions leading to acute and chronic disease are still poorly understood
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage and the role of the virus in liver cancer is still elusive
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules Its presence in serum indicates that virus replication is occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates
immunity following infection It remains detectable for life and is not found in chronic
carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low
infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as
well as those who clear the infection Its presence indicates exposure to HBV of the
chronic carrier[4]
23
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Homology or comparative modeling involves the prediction of the structure of a query
sequence from the structures of one or more structural templates The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query the assembly of the model the prediction of regions of the structure that are likely
to have different conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures As mentioned above homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30 sequence identity
with their best template
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models This is one of the sources of the 30 rule
because alignment quality usually decreases dramatically below about 30 sequence
identity (A structural explanation for this observation has been offered by Chung and
Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models [56]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing structure-based approaches to drug design and screening have
become increasingly important Alongside this explosion of structural information a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that incorporate
some degree of chemical information to actively guide the orientation of the ligand into
the binding site To reflect the focus on the use of chemical information a classification
scheme for guided docking approaches is proposed In general terms guided docking
approaches can be divided into indirect and direct approaches Indirect approaches
24
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
incorporate chemical information implicitly having an effect on scoring but not on
orienting the ligand during sampling In contrast direct approaches incorporate chemical
information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further divided into
protein-based mapping-based and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity Within
each category a representative list of docking approaches is discussed In view of the
limitations of current scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations ligand binding-mode predictions and virtual screening
enrichments obtained from protein-ligand docking [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts An overview of different approaches and algorithms is provided
Although the application of docking and scoring has led to some remarkable successes
there are still some major challenges ahead which are outlined here as well Approaches
to address some of these challenges and the latest developments in the area are presented
Some aspects of the assessment of docking program performance are discussed A
number of successful applications of structure-based virtual screening are described [8]
25
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
26
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science It involves the technology that uses computers for storage
retrieval manipulation and distribution of information related to biological
macromolecules such as DNA RNA and proteins
Bioinformatics is limited to sequence structural and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology It consists of two subfields the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems These tools are used in three areas of
genomic and molecular biological research molecular sequence analysis
molecular structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology information NCBI
creates public databases conducts research in computational biology develops
software tools for analyzing genome data and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
protein its domains structure post-translational modifications
variants etc) a minimal level of redundancy and high level of
integration with other databases
27
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and
sensitive protein similarity searches The original FASTP program was designed for
protein sequence similarity searching FASTA described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling program for
evaluating statistical significance There are several programs in this package that allow
the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-
Aye and stands for FAST-All because it works with any alphabet an extension of
FAST-P (protein) and FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein DNADNA
proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides SSEARCH
an implementation of the optimal Smith-Waterman algorithm A major focus of the
package is the calculation of accurate similarity statistics so that biologists can judge
whether an alignment is likely to have occurred by chance or whether it can be used to
infer homology The FASTA package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for
comparing primary biological sequence information such as the amino-acid sequences of
28
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
different proteins or the nucleotides of DNA sequences A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence No additional information is required about the protein under
consideration The protein can either be specified as a Swiss-ProtTrEMBL accession
number or ID or in form of a raw sequence White space and numbers are ignored If you
provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions) as well as the possibility to enter start and end position in two boxes
By default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
29
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family This improved SOPM
method (SOPMA) correctly predicts 695 of amino acids for a three-state description of
the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25 identity) proteins Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for
74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr
or on a Web page (httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
30
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
31
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
32
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
33
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
(29)
conserved which may in turn lead to experiments to test those hypotheses For example
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding to participate in binding some small molecule or to
foster association with another protein or nucleic acid
Figure First the known template 3D structures are aligned with the target sequence to be
modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and
side chain dihedral angles are transferred from the templates to the target Thus a number of
spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the
restraints as well as possible
34
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Homology modeling can produce high-quality structural models when the target and
template are closely related which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds The chief inaccuracies in homology modeling which worsen
with lower sequence identity derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure if indeed any are available The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure Protein threading also known as fold recognition or 3D-1D alignment
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are considered
sufficiently close in evolution to make a reliable homology model Other factors may tip
the balance in marginal cases for example the template may have a function similar to
that of the query sequence or it may belong to a homologous operon However a
template with a poor E-value should generally not be chosen even if it is the only one
available since it may well have a wrong structure leading to the production of a
misguided model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve upon individual
35
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
fold-recognition servers by identifying similarities (consensus) among independent
predictions
Often several candidate template structures are identified by these approaches Although
some methods can generate hybrid models from multiple templates most methods rely
on a single template Therefore choosing the best template from among the candidates is
a key step and can affect the final accuracy of the structure significantly This choice is
guided by several factors such as the similarity of the query and template sequences of
their functions and of the predicted query and observed template secondary structures
Perhaps most importantly the coverage of the aligned regions the fraction of the query
sequence structure that can be predicted from the template and the plausibility of the
resulting model Thus sometimes several homology models are produced for a single
query sequence with the most likely candidate chosen only in the final step
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production however more sophisticated approaches
have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in size The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions Protein-protein interactions are usually more rigid the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
36
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
binding and ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and
key mechanism There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand
and a flexible receptor or a flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules They move within respect to one another in a perpendicular
direction in respect to the interface This allows for binding of a receptor with a larger
than usual ligand Normally when there is ligand overlap in the docking interface energy
penalties incur If the van der Waals forces can be decreased energy loss in the system
37
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
will be minimilized This can be accomplished by allowing flexibility in the receptor
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced the receptor
can retain its rigidity while maintaing the free energy of the system For successful
docking the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface No docking is completely rigid
though there is intrinsic movement which allows for small conformational adaptation for
ligand binding When the six degrees of freedom for protein movement are taken into
consideration (three rotational three translational) the amount of inherent flexibility
allowed the receptor is even greater This further offsets any energy penalty between the
receptor and ligand allowing for easier more enegetically favorable binding between the
two
Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further
drug development The finding of our docking will be useful in finding a cure for the
infectious disease bird flu also it will open new avenues for finding other possible drug
targets in influenza A virus The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigensantibody or other cellular or immunological componentsIt is a molecule with in
a cell suface to which a substance (such as harmones or a drug )selectively bind causing
a change in the activity of the cell
LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through
the interaction of many weak noncovalent bonds formed to the binding site of a protein
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
38
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates (and the
cofactor if any) It also contains the residues that directly participate in the making and
breaking of bonds These residues are called the catalytic groups In essence the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With that in mind
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures Postive values
mean that the amino acid makes more contacts than one would expect by chance
negative values mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative values (eg
tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005
Ala 0025 Glu 0050 Arg 0055
Pro -0200
Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )
developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure It shows the possible
conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion
39
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
angles phi and psi Ramachandran used computer models of small polypeptides to
systematically vary and with the objective of finding stable conformations For each
conformation the structure was examined for close contacts between atoms Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii And the angles which cause
spheres to collide
correspond to sterically disallowed conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are Depending on how many programs one select to use the server can take several
minutes to run It also depends on how many residues there are in the protein that is
submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how unusual the
geometry of the residues in a given protein structure is as compared with stereo chemical
parameters derived from well-refined high resolution structure The checks also make
use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure One of the by-products of running PROCHECK is that coordinate file will be
ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a filendashextension
of new new file will have the atoms labelled in accordance with the IUPAC naming
convention
OUTPUT
40
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
The output comprises of the plots together with detailed residue-by-residue listing It
generates number of output files in the default directory which have the same name as the
original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed stereo
chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds angels and
dihedrals)Energy minimization can repair distorted geometries by moving atoms release
internal constraints Energy minimization is good to release local constraints for a
residue but it will not pass through high energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation This number can be used to evaluate a
particular conformation but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions For instance a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions for instance two atoms too near each other space and having a
huge Vander wals repulsion energy It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation Energy minimization isusually
performed by gradient optimization atoms are moved so as to reduce the net forces on
them The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations
41
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
42
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence 2 Evidence at transcript level
Blat result-
List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
43
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In
Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
44
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967
45
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Nitrogen N 711 (41)
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)
46
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)
47
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences 10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
48
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
8 Your input file clustalw2-20080510-09552541input
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98
49
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
50
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
51
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
52
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)
53
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
Phylogram
Tertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
54
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Open Swiss model and select load raw sequence option to load target molecule
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
55
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Save the file as the project
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
56
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Your Email address Gunjan300gmailcom (MUST be correct)
Your Name Gunjan
Request title Gunjan project Will be added to the results header
Your SWISS-MODEL project file can be found in
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
57
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
click on model bars
Fig structure of template after modeling
Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss
58
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software
59
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
60
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
61
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
62
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS
63
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025
64
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050
65
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
66
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI
67
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052
68
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
69
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025
70
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds
71
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A
72
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)
73
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040
74
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
75
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Conclusion
76
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related they have an important role in survival in different species It is interesting to
have closer look at the matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that are they
evolved together
With the finishing of the ongoing gene sequencing project on HBV we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function
In order to find out unknown structure of protein present in the
different species we do homology modelling We forward step to present a theoretical model
using available online modelling tools
As we study that HBeAG (Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with
appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be
developed
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries The
present work might be small finding of big issue
78
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth This includes methods for
collecting and analysing data as well as interpretation of those results as new biological
information
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein
As the new drugs target would be identified it will open new vistas for further drug
development The finding of our docking will be useful in finding a cure for the infectious disease
bird flu also it will open new avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area
79
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
BIBLIOGRAPHY
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
80
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research Institute La
Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular
Biophysics Columbia University New York New York 10032 USA
Reprint requests to Barry Honig Howard Hughes Medical Institute Department of
Biochemistry and Molecular Biophysics Columbia University New York NY 10032
USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple
structure and sequence alignments to improve sequence detection and alignment
Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based
prediction of functional sites in proteins Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking
J Mol Biol 311 395ndash408 [PubMed]
(77)
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and
Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein
database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]
81
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher
P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated
documentation resource for protein families domains and functional sites Bioinformatics
[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut
Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la
Barceloneta 37-49 08003 Barcelona (Catalonia) Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences
Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom
82
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
Abbreviation
83
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
84
85
86
85
86
86